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The invention concerns genes and methods for 
5 expressing eukaryotic and viral proteins at high levels 
in eukaryotic cells. 

Background of th« T nvent<o n 

Expression of eukaryotic gene products in 
prokaryotes is sometimes limited by the presence of 
10 codons that are infrequently used in E. coli. Expression 
of such genes can be enhanced by systematic substitution 
of the endogenous codons with codons overrepresented in 
highly expressed prokaryotic genes (Robinson et al. 
1984). it is commonly supposed that rare codons cause 
15 pausing of the ribosome, which leads to a failure to 
complete the nascent polypeptide chain and a uncoupling 
of transcription and translation. The mRNA 3' end of the 
stalled ribosome is exposed to cellular ribonuc leases, 
which decreases the stability of the transcript. 

Summary of th? invent* ?n 
The invention features a synthetic gene encoding a jl 
protein normally expressed in mammalian cells wherein at 
least one non-preferred or less preferred codon in the 
natural gene encoding the mammalian protein has been 
replaced by a preferred codon encoding the same amino 
acid. 

Preferred codons are: Ala (gcc) ; Arg (cgc) ; Asn 
(aac) ; Asp (gac) Cys (tgc) ; Gin (cag) ; Gly (gg C ) ; His 
(cac); He (ate); Leu (ctg) ; Lys (aag) ; Pro (ccc); Phe 
(ttc); Ser (age); Thr (ace) ; Tyr (tac) ; and Val (gtg) . 
Less preferred codons are: Gly (ggg, ; n e (att) ; Leu 
(etc) ; ser (tec) ; Val (gtc) . All codons which do not fit 
the description of preferred codons or less preferred 
codons are non-preferred codons. 
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By protein normally expressed in mammalian cells 
is meant a protein which is expressed in mammalian under 
natural conditions. The term includes genes in the 
mammalian genome such as Factor VIII, Factor IX, 
5 interleukins, and other proteins. The term also includes 
genes which are expressed in a mammalian cell under 
disease conditions such as oncogenes as well as genes 
which are encoded by a virus (including a retrovirus) 
which are expressed in mammalian cells post-infection 

10 In preferred embodiments, the synthetic gene is 

capable of expressing said mammalian protein at a level 
which is at least 110%, 150%, 200%, 500%, 1,000%, or 
10,000% of that expressed by said natural gene in an in 
vitro mammalian cell culture system under identical 

15 conditions (i.e., same cell type, same culture 
conditions, same expression vector). 

Suitable cell culture systems for measuring 
expression of the synthetic gene and corresponding 
natural gene are described below. Other suitable 

20 expression systems employing mammalian cells are well 
known to those skilled in the art and are described in, 
for example, the standard molecular biology reference 
works noted below. Vectors suitable for expressing the 
synthetic and natural genes are described below and in 

25 the standard reference works described below. By 

"expression" is meant protein expression. Expression can 
be measured using an antibody specific for the protein of 
interest. Such antibodies and measurement techniques are 
well known to those skilled in the art. By "natural 

3 0 gene" is meant the gene sequence which naturally encodes 
the protein. 

In other preferred embodiments at least 10%, 20%, 
30%, 40%, 50%, 60%, 70%, 80%, or 90% of the codons in the 
natural gene are non-preferred codons. 
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In a preferred embodiment the protein i, * 
retroviral protein t« * P™*«in is a 

protein, in a more preferred • 
protein ls . lentiviral f ™J^tZT " e 

= other preferred enbodiaents th. protein ls ™ 1, " 
°P«0, or gpl<0 . i„ other preferred eLa. " ' 
Protein is , huaan protein. '""odi-ents the 

The invention also features a math^ . 
a synthetic g .„. .„codin 9 . prot . in ZZ'^ 

encfdT oodons in the nature! ,.„. 

•neMifl, th. protein and replacino one or BO re of the 
non- P referred and less-preferred codons with a preferred 
codon encoding the sa D . a Bi „o acid a. the r.pl.£ I Zt 

Under so M circumstances (e.g., to per.it 
introduction of a restriction site, it .ay be desirable 

to r ei)lace . non . pr . ( . rred codon with a y 

codon rather than a preferred codon. 

It is not necessary to replace all less preferred 
20 or n=„-preferr.d codons with preferred codons. incr " ed 
expression can be accomplished even with partial 

replacement, 

in other preferred embodiments the invention 
features vectors (including expression vectors) 
25 comprising the synthetic gene. 

By "vector" is meant a DMA molecule, derived 
e.g., from a plasmid, bacteriophage, or mammalian or' 
insect virus, into which fragments of DNA may be inserted 
or cloned, a vector will contain one or more unique 
restriction sites and may be capable of autonomous 
replication in a defined host or vehicle organism such 
that the cloned sequence is reproducible. Thus by 
"expression vector" is meant any autonomous element 
capable of directing the synthesis of a protein, such 
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DNA expression vectors include mammalian plasmids and 
viruses. 

The invention also features synthetic gene 
fragments which encode a desired portion of the protein. 
5 Such synthetic gene fragments are similar to the 

synthetic genes of the invention except that they encode 
only a portion of the protein. Such gene fragments 
preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 
10 In constructing the synthetic genes of the 

invention it may be desirable to avoid CpG sequences as 
these sequences may cause gene silencing. 

The codon bias present in the HIV gpl20 envelope 
gene is also present in the gag and pol proteins. Thus, 
15 replacement of a portion of the non-preferred and less 
preferred codons found in these genes with preferred 
codons should produce a gene capable of higher level 
expression. A large fraction of the codons in the human 
genes encoding Factor VIII and Factor IX are non- 
20 preferred codons or less preferred codons. Replacement 
of a portion of these codons with preferred codons should 
yield genes capable of higher level expression in 
mammalian cell culture. Conversely, it may be desirable 
to replace preferred codons in a naturally occurring gene 
2 5 with less-preferred codons as a means of lowering 
expression. 

Standard reference works describing the general 
principles of recombinant OKA technology include Watson, 
J.D. et al., Molecular Bioloov of the Gene. Volumes I and 

30 II, the Benjamin/Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell, J.E. et al., 
Molecular cell Biology . Scientific American Books, Inc., 
Publisher, New York, N.Y. (1986); Old, R.W., et al. , 
Principles of Gene Manipulation: An Intr<?4ucU<?n t<? 

35 Genetic Engineering , 2d edition, University of California 
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1°" sp ""* "" bC = r0n ^»tcirp r lIs h t!^;/ s n p ^" g 

Harbor, NY (19*9); and ~ - - spring 

Biology, Ausubei; . e.fc al 
(1989) . 
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ml3 « PiCtS S6qUenCe of th « synthetic 

9P 20 (SEQ ID no: 34, and a synthetic gp 160 (SEQ ID N0: 

35) gene in which codons have been replaced by those 
found m highly expressed human genes. 

Figure 2 is a schematic drawing of the synthetic 
9P120 (HIV-l MN) aene Th« «h a ^ ^. syntnetic 
; gene - Tne shaded portions marked vi to 

V5 indicate hypervariable regions. The filled box 
indicates the CD4 binding site, a limited number of the 
unique restriction sites ares shown: H (Hind3), Nh 

Nhel), P (Pstl) , Na (Nael)< M (MlulJ , R (ECOR1), A 
(Agel, and No (Notl, . The chemically synthesized DNA 
fragments which served as PGR templates are shown below 
the gpi20 sequence, along with the locations of the 
primers used for their amplification. 

Figure 3 is a photograph of the results of 
transient transfection assays used to measure gpi20 
expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids 
expressing gpi20 encoded by the IIIB isolate of HIV-l 
(g P 120IIIb) , by the MN isolate (gpi20mn) , by the MN 
isolate modified by substitution of the endogenous leader 
peptide with that of the CD 5 antigen (gpl20mnCD5L) , or by 
the chemically synthesized gene encoding the MN variant 
with the human CDSLeader (syngpl20mn) . Supernatants were 
harvested following a 12 hour labeling period 60 hours 
post-transfection and immunoprecipitated with CD4:i gG i 
fusion protein and protein A sepharose. 
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Figure 4 is a graph depicting the results of ELISA 
assays used to measure protein levels in supernatants of 
transiently transfected 293T cells. Supernatants of 293T 
cells transfected with plasmids expressing gpl20 encoded 
5 by the IIIB isolate of HIV-1 (gpl20 Illb) , by the MN 
isolate (gpl20mn) , by the MN isolate modified by 
substitution of the endogenous leader peptide with that 
of CD 5 antigen (gpl20mn CD5L) , or by the chemically 
synthesized gene encoding the MN variant with human CDS 

10 leader (syngpl20mn) were harvested after 4 days and 
tested in a gpl20/CD4 ELISA. The level of gpl20 is 
expressed in ng/ml. 

Figure 5, panel A is a photograph of a gel 
illustrating the results of a immunoprecipitation assay 

15 used to measure expression of the native and synthetic 
gpl20 in the presence of rev in trans and the RRE in cis. 
In this experiment 293T cells were transiently 
transfected by calcium phosphate coprecipitation of 10 /xg 
of plasmid expressing: (A) the synthetic gpl20MN sequence 

20 and RRE in cis, (B) the gpl20 portion of HIV-1 IIIB, (C) 
the gpl20 portion of HIV-1 IIIB and RRE in cis, all in 
the presence or absence of rev expression. The RRE 
constructs gpl20IIIbRRE and syngpl20mnRRE were generated 
using an Eagl/Hpal RRE fragment cloned by PCR from a 

25 HIV-1 HXB2 proviral clone. Each gpl20 expression plasmid 
was cotransf ected with 10 fiq of either pCMVrev or CDM7 
plasmid DNA. Supernatants were harvested 60 hours post 
transf ection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing 

3 0 SDS-PAGE. The gel exposure time was extended to allow the 
induction of gpl20IIIbrre by rev to be demonstrated. 
Figure 5, panel B is a shorter exposure of a similar 
experiment in which syngp!20mnrre was cotransf ected with 
or without pCMVrev. Figure 5, panel C is a schematic 

35 diagram of the constructs used in panel A. 
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Bl ,. l9Ur * 6 18 3 con P^ison of the sequence of the 
wildtype rat thy-1 gene (wt) (SEQ. ». N 0: 37, and a 
synthetic rat THY-i gene (env, (SEQ . ID . N0: ^ 
constructed by chemical synthesis and having the most 
5 prevalent codons found in the HIV-i env gene 

ratTHV /IT" 7 ' S " SChematic dia ^ a » ot the synthetic 

ll^l " ^ S ° Ud WaCk b ° X denotes «». signal 

peptide. The shaded box denotes the sequences in the 

L0 Z7il7 T iCh dir6Ct att3Chfflent ° f 3 P^tidyl- 
L0 xnos.tol glycan anchor. Unique restriction sites used 

for assembly of the THY-i constructs are marked H 

of the synthetic oligonucleotides employed in the 
construction are shown at the bottom of the figure. 

Figure 8 is a graph depicting the results of flow 
cytometry analysis, m this experiment 293T cells 
transiently transfected with either wildtype rat THY-i 
(dark Ixne) , ratTHY-l with envelope codons (light line) 
or vector only (dotted line) . 293T cells were 
► transfected with the different expression plasmids by 
calcium phosphate coprecipitation and stained with anti- 
ratTHY-i monoclonal antibody 0X7 followed by a polyclonal 
FITC- conjugated anti-mouse lg G antibody 3 days after 
transfection. 

Figure 9, panel A is a photograph of a gel 
illustrating the results of immunoprecipitation analysis 
of supernatants of human 293T cells transfected with 
either syngpi20mn (A) or a construct syngpi 20 mn. rTHY-ienv 
whach has the rTHY-ienv gene in the 3 < untranslated 
region of the syngpi20mn gene (B, . The 

syngpl20mn. rTHY-ienv construct was generated by inserting 
a Notl adapter into the blunted Hind3 site of the 
rTHY-ienv plasmid. Subsequently, a 0.5 kb Notl fragment 
containing the rTHY-ienv gene was cloned into the 
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Notl site of the syngpl20mn plasmid and tested for 
correct orientation. Supernatants of 35S labelled cells 
were harvested 72 hours post transf ection, precipitated 
with CD4:IgG fusion protein and protein A agarose, and 
5 run on a 7% reducing SOS-PAGE. Figure 9, panel B is a 
schematic diagram of the constructs used in the 
experiment depicted in panel A of this figure. 

Description of the Preferred Embodiments 

Construction of a Synthetic crpl20 Gene Having Codons 

10 Found in Highly Expressed Human Genes 

A codon freguency table for the envelope precursor 
of the LAV subtype of HIV-1 was generated using software 
developed by the University of Wisconsin Genetics 
Computer Group. The results of that tabulation are 

15 contrasted in Table 1 with the pattern of codon usage by 
a collection of highly expressed human genes. For any 
amino acid encoded by degenerate codons , the most favored 
codon of the highly expressed genes is different from the 
most favored codon of the HIV envelope precursor. 

20 Moreover a simple rule describes the pattern of favored 
envelope codons wherever it applies: preferred codons 
maximize the number of 

adenine residues in the viral RNA. In all cases but one 
this means that the codon in which the third position is 

25 A is the most frequently used. In the special case of 
serine, three codons equally contribute one A residue to 
the mRNA; together these three comprise 85% of the codons 
actually used in envelope transcripts. A particularly 
striking example of the A bias is found in the codon 

30 choice for arginine, in which the AGA triplet comprises 
88% of all codons. In addition to the preponderance of A 
residues, a marked preference is seen for uridine among 
degenerate codons whose third residue must be a 
pyrimidine. Finally, the inconsistencies among the less 
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frequently used variants can be accounted for by the 
observation that the dinucleotide CpG is 
unrepresented; thus the third position is less liKely 
to be G whenever the second position is c, as in tiL 

r^^: t nin :' PrOUne ' S6rine « -reoninerind 
TABLE 1 / riPl ;A S „ f< l r ar9inine a " h ~*lY used at all. 



10 Ala. 

GC 



15 



CG 



20 



AG 



Asn 

25 AA 



GA 



10 



5 CT 



TT 



AA 
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A 
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17 
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T 
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5 
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22 
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Codon frequency was calculated using the GCG program 
established the the University of Wisconsin Genetics 
Computer Group. Numbers represent the percentage of 
15 cases in which the particular codon is used. Codon usage 
frequencies of envelope genes of other HIV-1 virus 
isolates are comparable and show a similar bias. 



In order to produce a gpl20 gene capable of high 

20 level expression in mammalian cells, a synthetic gene 
encoding the gpl20 segment of HIV-1 was constructed 
(syngpl20mn) , based on the sequence of the most common 
North American subtype, HIV-1 MN (Shaw et al. 1984; Gallo 
et al. 1986) . In this synthetic gpl20 gene nearly all of 

25 the native codons have been systematically replaced with 
codons most frequently used in highly expressed human 
genes (FIG. 1) . This synthetic gene was assembled from 
chemically synthesized oligonucleotides of 150 to 200 
bases in length. If oligonucleotides exceeding 120 to 

30 150 bases are chemically synthesized, the percentage of 
full-length product can be low, and the vast excess of 
material consists of shorter oligonucleotides. Since 
these shorter fragments inhibit cloning and PCR 
procedures, it can be very difficult to use 

35 oligonucleotides exceeding a certain length. In order to 
use crude synthesis material without prior purification, 
single-stranded oligonucleotide pools were PCR amplified 
before cloning. PCR products were purified in agarose 
gels and used as templates in the next PCR step. Two 
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oviZL? 9 ™' 3 C ° Uld ^ or 
Th.se fragments, which were between 350 and 400 bp in 



containi,,„ ... , r'™:-»aivea plasmid 

cohta^mg the leader sequence of the CDS surface 

ZlT l f ° UOWed bX 3 """/Pstl/Mlui/KcoHl/BamJU 
polylinker. Each . . 



30 



polylinker. Each of the restr,^*^ 
polvlink*,- restriction enzymes in this 

the 5 1 3 r pr : se r s a site that is present •* 

in k ° f the PCR -9enerated fragments Thus 

l S : h T tial SUbCl ° ning ° f "* ° f the ^on g fraZn^s 
the whole wlao g ene was assayed, Por each fragmeTT 

to 6 different clones were subcloned and sequenced prior 
to assembly. A schematic drawing of the 
construct the synthetic gpl20 is shown in „ G °V L 
-IT" ° f the synthetic gpl20 ge ne (and a synthetic 
11 FIG 9 7. USin9 ^ °™ *<™ i- resented 

The mutation rate was considerable Th« 
commonly found mutations were short » nud.otideT a„d 
0 ion, ( up to ,0 nucleotides, deletions. In 

was necessary to exchange parts with either synthetic 
dapters or pieces fro. other subclones without Itation 

IT "^on- — deviations fro* strict 

adherence to optimized codon usage were mad. to 

accomodate the introduction of restriction sites into 
the resulting gene to facilitate the replacement of 
various segments (FIG. 2) . These unlque restriction 
were introduced into the ,.„. at approximately ioo bp 

STZ?: • T native HIV leader sequance — 

with the highly efficient leader peptide of the 
antigen to facilitate secretion. The plasmid used for 
construction is a derivative of the mammalian expression 
vector pCD„, transcribing the inserted gene undeT the 
control of a strong human CMV immediate early promoter. 
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To compare the wild- type and synthetic gp!20 
coding sequences, the synthetic gpl20 coding sequence was 
inserted into a mammalian expression vector and tested in 
transient transfection assays. Several different native 
5 gpl20 genes were used as controls to exclude variations 
in expression levels between different virus isolates and 
artifacts induced by distinct leader sequences. The 
gp!20 HIV Illb construct used as control was generated by 
PCR using a Sall/Xhol HIV-1 HXB2 envelope fragment as 

10 template. To exclude PCR induced mutations a Kpnl/Earl 
fragment containing approximately 1.2 kb of the gene was 
exchanged with the respective sequence from the proviral 
clone. The wildtype gp!20mn constructs used as controls 
were cloned by PCR fron HIV-1 MN infected C8166 cells 

15 (AIDS Repository, Rockville, MD) and expressed gpl20 
either with a native envelope or a CDS leader sequence. 
Since proviral clones were not available in" this case, 
two clones of each construct were tested to avoid PCR 
artifacts. To determine the amount of secreted gpl20 

2 0 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate 
coprecipitation were immunoprecipitated with soluble 
CD4 : immunoglobulin fusion protein and protein A 
sepharose. 

25 The results of this analysis (FIG. 3) show that 

the synthetic gene product is expressed at a very high 
level compared to that of the native gpl20 controls. The 
molecular weight of the synthetic gpl20 gene was 
comparable to control proteins (FIG. 3) and appeared to 

30 be in the range of 100 to 110 kd. The slightly faster 
migration can be explained by the fact that in some tumor 
cell lines like 293T glycosylation is either not complete 
or altered to some extent. 

To compare expression more accurately gpl2 0 

35 protein levels were quantitated using a gpl20 ELISA with 
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CD4 in the 



4) that ELISA da t ? ™ S "^i* Sh ° WS 

«; mat ELISA data were comparable to the 

immunoprecipitation data with » 

' n a 9P120 concentration of 

rllV *«**««nd cutoff (5 ng/ml, for all the 

native gpi 2 o genes. Thus, expression ^ 
OD120 ««no » expression of the synthetic 

g P 120 gene appears to be at least one order of magnitude 
higher than wildtype gp 120 genes, m the experiment 
shown the increase was at least 25 fold 
10 The pole of rftv <n „ r??n fTfr _. _. n| 

Since rev appears to exert its ef fect at several 
steps m the expression of a viral transcript th 
possible role of non-translational effects in 'the 6 
15 tllZT ! XPreSSi ° n ° f the S **^~ *P"° gene was 

s^nal > ' * ^ ^ ^ *~"*"ty that negative 
signals elements conferring either increased mRNA 

egradation or nucleic retention were eliminated by 

t:zz: vcttiiizr' cytopiasaic -* ieve - 

cytoplasmic RNA was prepared by NP40 lysis 
20 of transiently .transfected 293T cells and subsequent 

elimination of the nuclei by centrifugation. Cytoplasmic 
RNA was subsequently prepared from lysates by multiple 
Phenol extractions and precipitation, spotted on 

25 ^T'T 0 " USin9 3 Sl0t bl0t and finally 

25 hybridized with an envelope-specific probe. 

Briefly, cytoplasmic mRNA 293 cells transfected 
with CDM 4 , gp 120 iiib, or syngpi20 was isolated 36 hours 
post transfection. Cytoplasmic RNA of Hela cells 
infected with wildtype vaccinia virus or recombinant 
»0 varus expressing gp i20 Iiib or the synthetic gpi 2 0 gene 
was under the control of the 7.5 promoter was isolated 16 
hours post infection. Equal amounts were spotted on 
nitrocellulose using a slot blot device and hybridized 
with randomly labelled 1.5 kb gpl20IIIb and syngpi20 
5 fragments or human beta-actin. RNA expression levels 
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were quantitated by scanning the hybridized membranes 
with a phospoimager . The procedures used are described 
in greater detail below. 

This experiment demonstrated that there was no 
5 significant difference in the mRNA levels of cells 
transfected with either the native or synthetic gpl20 
gene. In fact, in some experiments cytoplasmic mRNA 
level of the synthetic gpl20 gene was even lower than 
that of the native gpl20 gene. 
10 These data were confirmed by measuring expression 

from recombinant vaccinia viruses. Human 293 cells or 
Hela cells were infected with vaccinia virus expressing 
wildtype gpl20 Illb or syngpl20mn at a multiplicity of 
infection of at least 10. Supernatants were harvested 24 
15 hours post infection and immunoprecipitated with 

CD4 : immunoglobin fusion protein and protein A sepharose. 
The procedures used in this experiment are described in 
greater detail below. 

This experiment showed that the increased 
20 expression of the synthetic gene was still observed when 
the endogenous gene product and the synthetic gene 
product were expressed from vaccinia virus recombinants 
under the control of the strong mixed early and late 7.5k 
promoter. Because vaccinia virus mRNAs are transcribed 
25 and translated in the cytoplasm, increased expression of 
the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This 
experiment was repeated in two additional human cell 
types, the kidney cancer cell line 293 and HeLa cells. 
30 As with transfected 293T cells, mRNA levels were similar 
in 293 cells infected with either recombinant vaccinia 
virus. 
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Codon Vffff^ ? j| p Lent: ft 

Because it appears that codon usage has a 
significant impact on expression in mammalian cells the 
codon frequency in the envelope genes of other 

pattern of codon preference between retroviruse " 
general However, if viruses from the Antiviruses 
to w hlcn HIV-i belongs to, were analyzed separately, 

0 fo°un°d n T e aln ° St identiCal t0 that ° f HXV-i « 

0 found. A codon frequency table from the envelope 

glycoproteins of a variety of (predominantly type C) 
retroviruses excluding the Antiviruses was prepared and 
compared a codon frequency table created fro! the 
envelope sequences of four Antiviruses not closely 
> related to HIV-x (caprine artnritis encepnaUtis J 

equine infectious anemia virus, feline immunodef icien c ; 
virus, and vi Sna virus, (Table 2). The codon usage 

pattern for Antiviruses is strikingly similar to that of 
HIV-!, ln all cases but Qnet preferred codQn 

HIV-i is the same as the preferred codon for the other 

TcTr^ T exception is proline ' which is 

by CCT m 41% of non-HIV Antiviral envelope residues 
and by cca in 40% of residues, a situation which clearly 
also reflects a significant preference for the triplet 
ending in A. The pattern of codon usage by the non- 
Antiviral envelope proteins does not show a similar 
predominance of a residues, and is also not as skewed 
toward third position C and G residues as is the codon 
usage for the highly expressed human genes, m general 
non-Antiviral retroviruses appear to exploit the 
different codons more equally, a pattern they share with 
less highly expressed human genes. 
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TABLE 2: 



Codon frequency in the envelope gene of 
lentiviruses (lenti) and non-lentiviral 
retroviruses (other) . 



Other Lenti 



5 hl& 



other Lenti 





GC 


c 


45 


13 


TG 


C 


53 


21 






T 


2 D 


37 




T 


47 


mm j\ 

79 






A 


20 


46 














G 


9 


3 


Gin 








10 










CA 


A 


52 


69 




ACS 










G 


48 


31 




CG 


C 


14 


2 














T 


6 


3 


Glu 












A 


16 


5 


GA 


A 


57 


W O 


15 




G 


17 


3 




G 


43 


32 




AG 


A 


31 


51 














G 


15 


26 


Qly 


















GG 


C 


21 


8 














T 


13 


9 


20 


AA 


C 


49 


31 




A 


37 


56 






T 


51 


69 




G 


29 


26 




Aap 








ais 










GA 


C 


55 


33 


CA 


C 


51 


38 






T 


51 


69 




T 


49 


62 


25 




























AT 


C 


38 


16 














T 


31 


22 














A 


31 


61 












8er 








30 


CT 


C 


22 


8 


TC 


C 


38 


10 






T 


14 


9 




T 


17 


16 






A 


21 


16 




A 


18 


24 






G 


19 


11 




G 


6 


5 




TT 


A 


15 


41 


AG 


C 


13 


20 


35 




G 


10 


16 




T 


7 


25 






















AA . 


A 


60 


63 


AC 


C 


44 


18 






G 


40 


37 




T 


27 


20 














A 


19 


55 


10 












G 


10 


8 




cc 


C 


42 


14 














T 


30 


41 


Tyr 












A 


20 


40 


TA 


C 


48 


28 






G 


7 


5 




T 


52 


72 
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TT 



C 
T 



52 
48 



25 
75 



GT 



c 


36 


9 


T 


17 


10 


A 


22 


54 


G 


25 


27 



10 



15 



20 



25 



SR t>r GrOUp * Nuab «« rep«s^t W i!f° nsin Genetics 
which a particular codon is "IS rL P erce "tage in 
lentiviral retroviruses wal coZtiiJ S° n usa * e °f non- 
precursor sequences of bovine lltltt f ro ? the e "velope 
leukemia virus, human T-cell iij£l? xa ? lrus feli "« 

ESttJ ^Pjotropic virultyp S S/iSf type l ' hu »*« 
forming isolate of murine lluZ*liL ■ mink cel1 focus- 

SE'JSE Spleen f ocus"SrmiSf tsoi^t™? h < MuLV > ' the 
the 4070A amphotropic isolate «J fj ' the 10A1 isolate, 
leukemia virus isolate and 'J 6 , nvel ©Proliferative 

simian sarcoma virus iii?? ~° m rat le "*e*ia virus 
leukemogenic mSSJIto fiSsTn^ ^S- la vir ^' 

Sv"?' J h f Codon f "quenc"t^bles f?i b ?H° n * Pe leu * eni * 
f5L lentlViruses w «e compiled from f °L tne non -»™, non- 
precursor sequences fr,*- ~l!t • rOB tne envelope 
yirus, equine iSESt'SL^Ei?! arthritis encephalitis 

immunodeficiency viru? «S !" Vlrus ' fe *ine 
virus, and visna virus. 



30 



35 



Position for alanine, proline, serine and thr» n • 
triplets is rarely G. The retrovi^i f hre ° nine 
show a similar but- i ret roviral envelope triplets 

of C PG . T he fflost 1 " P ~ C 6d ' "Representation 
P<». The most obvious difference between i*n«-<, • 

l« s ln the us. 9 e or th. cox variant of arglnlne 
trapLts, which is reaso„a Wy f r ea U . ntl v r.p r e s . nt . d 
«- , the retroviral enveiope co dl „ 9 ^J^Ti. 

lelel. n r er P " Sent " ^ — « Antivirus 



I 



40 




IJative ^nd. 

To examine whether regulation by rev is c «nno - „ 
to HIV-x codon usage, the influence of rev on the d 



w o«mm« pcr/us«/nsn 
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expression of both native and synthetic gene was 
investigated. Since regulation by rev requires the rev- 
binding site RRE in cis, constructs were made in which 
fchis binding site was cloned into the 3' untranslated 
5i fc^gion of both the native and the synthetic gene. These 
plasmids were co-transf ected with rev or a control 
plasmid in trans into 293T cells, and gpl20 expression 
levels in supernatants were measured semiquantitatively 
by immunoprecipitation. The procedures used in this 

10 experiment are described in greater detail below. 

As shown in FIG. 5, panels A and B, rev 
upregulates the native gpl20 gene, but has no effect on 
the expression of the synthetic gpl20 gene. Thus, the 
action of rev is not apparent on a substrate which lacks 

15 the coding sequence of endogenous viral envelope 



Expre 



envelope codons 

The above-described experiment suggest that in 
20 fact "envelope sequences" have to be present for rev 
regulation. In order to test this hypothesis, a 
synthetic version of the gene encoding the small, 
typically highly expressed cell surface protein, rat 
THY-1 antigen, was prepared. The synthetic version of 
25 the rat THY-1 gene was designed to have a codon usage 

like that of HIV gpl20. In designing this synthetic gene 
AUUUA sequences, which are associated with mRNA 
instability, were avoided. In addition, two restriction 
sites were introduced to simplify manipulation of the 
30 resulting gene (FIG. 6). This synthetic gene with the 
HIV envelope codon usage (rTHY-lenv) was generated using 
three 150 to 170 mer oligonucleotides (FIG. 7) . In 
contrast to the syngpl20mn gene, PCR products were 
directly cloned and assembled in pUC!2, and subsequently 
35 cloned into pCDM7. 
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Expression levels of native rTwv i 
th. KV, envelope codons were ^antitZ b y ^ ^ h 
immunofluorescence of transiently transfected „„ cells 
PIS 8 shows that the expression of the native THY i 

ilTT t ?° orders of M9nitude ab °« - 

l«v.l of the controi transfected cells (pCDM7) * 
contrast, expression of the synthetic rat THY-i is 

^hTihufto'oTr" that of the native 

10 number, t0WardS " l0 " er ch »"^ 

To prove that no negative sequence elements 
proving degradation were inadvertently introduced 
a construct was generated in which the rTHY fJ ntrCdUC,<i ' 
cloned at th. 3' end of the svnth«^ ««• was 

15 panel B) r„ *>,< synthetic gpi 20 gene (FIG. 9, 

panel B) . m this experiment 293T cells were t».., . \ 

r^r,:: ; yn9pi2o " n — « - -~ ™- 

measured " 7 (Syn9pl20 »"--««-l«v ) . Expression was 
measured by immunopr.cipitation with CD4 = Igc fusion 
prot. ln and prot . ln A agaroM n 

0 th- .xp.ram.nt are described in greater detail beiow 

the synthetic gpiao gene has an UAG stop 
codon. rTHY-lenv i, not translated fro. this transcrTot 

lrZTZ\: UaentS C ° nf " rina enh " Md Sedation Wr. 

5 from th/ S " ,U< " ,Ca ' 9P "° Pr ° t,ln -xpressed 

from thxs construct should be decreased in comparison to 

the syngp^Omn construct without rTHY-lenv. FIG « 
P«.l A, show, that th. expression of both constructs is 

Lid": \" atln9 that tha lo " • xp "" i » b. 

linked to translation. 



3 ° ^ena ent e 



oene w<»h ^^ Y^' ^^^^^n of synthetic rat thv-j 

To explore whether rev is able to regulate 
expression of a rat THY-i gene having env codons, a 
construct was made with a rev-binding site in the 3< end 
35 of the rTHVlenv open reading fraae. To measure rev- 
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responsiveness of the a rat THY-lenv construct having a 
3' RRE, human 293T cells were cotransf ected 
ratTHY-lenvrre and either CDM7 or pCMVrev. At 60 hours 
post transfection cells were detached with 1 mM EDTA in 
5 PBS and stained with the OX-7 anti rTHY-1 mouse 
monoclonal antibody and a secondary FITC-conjugated 
antibody. Fluorescence intensity was measured using a 
EPICS XL cytof luorometer. These procedures are described 
in greater detail below. 
10 In repeated experiments, a slight increase of 

rTHY-lenv expression was detected if rev was 
cotransf ected with the rTHY-lenv gene. To further 
increase the sensitivity of the assay system a construct 
expressing a secreted version of rTHY-lenv was generated. 
15 This construct should produce more reliable data because 
the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production 
over an extended period, in contrast to surface expressed 
protein, which appears to more closely reflect the 
20 current production rate. A gene capable of expressing a 
secreted form was prepared by PCR using forward and 
reverse primers annealing 3' of the endogenous leader 
sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The 
2 5 PCR product was cloned into a plasmid which already 
contained a CDS leader sequence, thus generating a 
construct in which the membrane anchor has been deleted 
and the leader sequence exchanged by a heterologous (and 
probably more efficient) leader peptide. 
30 The rev-responsiveness of the secreted form 

ratTHY-lenv was measured by inununoprecipitation of 
supernatants of human 293T cells cotransf ected with a 
plasmid expressing a secreted form of ratTHY-lenv and the 
RRE sequence in cis (rTHY-lenvPI-rre) and either CDM7 or 
35 pCMVrev. The rTHY-1 en vPl -RRE construct was made by PCR 
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using the oligonucleotides 

cgcggggctagcgcaaagagtaataagtttaac as forward and 
cgcggatcccttgtattttgtactaata a as reverse primers and th* 
synthetic rTHY-ienv construct as template. Ift" ' ^ 
digestion with Nhel and Notl the PCR fragment was cloned 

Suoe™rT id /3 0 5 ntaining l€3der and *** -guences. 
Supernatant* of « s labeUe{J ^ ^ harvested ?2 

hours post transfection, precipitated with a mouse 
monoclonal antibody 0X7 against rTHY-i and anti mouse i gG 
sepharose, and run on a 12% reducing SDS-page. 

In this experiment the induction of rTHY-ienv by 
rev was much more prominent and clearcut than in the 

iTa V b^ e to Cr t ibed r^ 1 -* Str0n ^ -W-f that rev 
is able to translationally regulate transcripts that a ~ 
suppressed by low-usage codons. 




immunog lobu lin 



20 



25 



30 



k WhCther l0W - USa ^ e codo "3 »u.t be present 

throughout the whole coding sequence or whether a short 

region is sufficient to confer rev-responsiveness, a 
rTHY- len v: immunoglobulin fusion protein was generated, 
in this construct the rTHY-ienv gene (without the 
sequence motif responsible for phosphatidylinositol 

I^T^' " Unked ^ ^ hUna " Z *» hi "^' ™ 

PCR B ThlS C ° nStrUCt W " 9enerated * 

PCR using primers with Nhel and BamHI restriction sites 

and rTHY-ienv as template. The PCR fragment was cloned 

into a plasmid containing the leader sequence of the CDS 

surface molecule and the hinge, CH2 and CH3 parts of 

human IgGl immunoglobulin. A Hind3/Eagi fragment 

containing the rTHY-lenvegl insert was subsequently 

cloned into a pCDM7-derived plasmid with the rre 



To measure the response of the rTHY-ienv/ 
35 immunoglobin fusion gene (rTHY-lenvegirre) to rev human 
293T cells cotransfected with rTHY-lenvegirre and either 



WO 96/09378 



PCT/US95/11511 



- 22 - 



pCDM7 or pCMVrev. The rTHY-lenveglrre construct was made 
by anchor PCR using forward and reverse primers with Nhel 
and BamHl restriction sites respectively. The PCR 
fragment was cloned into a plasmid containing a CD5 
5 leader and human IgGl hinge, CH2 and CH3 domains. 

Supernatants of 35 s labelled cells were harvested 72 hours 
post transfection, precipitated with a mouse monoclonal 
antibody 0X7 against rTHY-1 and anti mouse IgG sepharose, 
and run on a 12% reducing SDS-PAGE. The procedures used 
10 are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/ immunoglobulin fusion protein is secreted into 
the supernatant. Thus, this gene should be responsive to 
rev-induction. However, in contrast to rTHY-lenvPI-, 
15 cotransf ection of rev in trans induced no or only a 
negligible increase of rTHY-lenvegl expression. 

The expression of rTHY-1 : immunoglobulin fusion 
protein with native rTHY-1 or HIV envelope codons was 
measured by immunoprecipitation. Briefly, human 293T 
2 0 cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-lwtegl (native codons) . The rTHY-lwtegl 
construct was generated in manner similar to that used 
for the rTHY-lenvegl construct, with the exception that a 
plasmid containing the native rTHY-1 gene was used as 
2 5 template. Supernatants of 35 S labelled cells were 

harvested 72 hours post transfection, precipitated with a 
mouse monoclonal antibody 0X7 against rTHY-1 and anti 
mouse IgG sepharose, and run on a 12% reducing SDS-PAGE. 
THe procedures used in this experiment are described in 
30 greater detail below. 

Expression levels of rTHY-lenvegl were decreased 
in comparison to a similar construct with wildtype rTHY-1 
as the fusion partner, but were still considerably higher 
than rTHY-lenv. Accordingly, both parts of the fusion 
35 protein influenced expression levels. The addition of 
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rTHY-ienv did not restrict expression * 
as seen for rTHY-ienv alone Thus r 'V" ^ level 
appears to be ineffective If Zotl'i * ™ 

al-ost completely suppressed eXP " SSi ° n iS 




10 



hxv .^rss'jzz r n — °< 

dl „ er . nc . for y .r^^^r-r * 

simple measure of the statistical sfm ,<.- 

==don preference is the findi"" , t °* thl * 

residue is A or u i n all „i„I " ° r * d thlrd 

nine of two .gui^l" ^^fTll"* 



approximately 0.004 and heZ7l * " 

15 measure the third I T * conven tional 

Uire tne third residue choice Mn«~«. w 

random. Further evident T considered 

urtner evidence of a skewed codon prefers. ( 
found among the more degenerate codon, vhet " T 
selection for triplets tearing adenine can L s e .„ * 
contrasts with the pattern for highly expresL 
!0 which favor codons bearing c or l.. XP " S " d » en "< 

tnird position of ^J^'JT^^ " 

degeneracy. Iold 



30 



The systematic exchange of native codons with 

in!" 8 1 '^'"^ —atic. lly 

hv eL« h eJIPreSSi0n °< *>"»• * quantitative analysis 
by ELISA showed that exoressinn ^ ysis 
at least 25 fold h . K eXpression of th e synthetic gene was 

t least 25 fold higher in comparison to native gpi 2 o 
after transient transfection into human 293 cells The 

concentration levels in the ettsa • 

rafhflr , „. 6 ELISA experiment shown were 

rather low. since an ELISA was ««o^ 

which is hased on g plJ „ ~£ llVCZ^TT 

I::::::?:"' 11 * 1 v - detected - This " - 

apparent low expression. Measurement of cytoplasmic drka 
levels demonstrated that the difference in protein 
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expression is due to translational differences and not 
mRNA stability. 

Retroviruses in general do not show a similar 
preference towards A and T as found for HIV. But if this 
5 family was divided into two subgroups, lentiviruses and 
non-lentiviral retroviruses, a similar preference to A 
and, less frequently, T, was detected at the third codon 
position for lentiviruses. Thus, the availing evidence 
suggests that lentiviruses retain a characteristic 

10 pattern of envelope codons not because of an inherent 
advantage to the reverse transcription or replication of 
such residues, but rather for some reason peculiar to the 
physiology of that class of viruses. The major 
difference between lentiviruses and non-complex 

15 retroviruses are additional regulatory and non- 
essential^ accessory genes in lentiviruses, as already 
mentioned. Thus, one simple explanation for the 
restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional 

20 molecules is based on it. In fact, it is known that one 
of these proteins, rev, which most likely has homologues 
in all lentiviruses. Thus codon usage in viral mRNA is 
used to create a class of transcripts which is 
susceptible to the stimulatory action of rev. This 

2 5 hypothesis was proved using a similar strategy as above, 

but this time codon usage was changed into the inverse 
direction. Codon usage of a highly expressed cellular 
gene was substituted with the most frequently used codons 
in the HIV envelope. As assumed, expression levels were 

3 0 considerably lower in comparison to the native molecule, 

almost two orders of magnitude when analyzed by 
immunofluorescence of the surface expressed molecule (see 
4.7). If rev was coexpressed in trans and a RRE element 
was present in cis only a slight induction was found for 
35 the surface molecule. However, if THY-1 was expressed as 
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pr"!„.„t ' ^ indUCti ° n by "» — -ore 

croLr : SUPP °" ln9 th « above hypothesis. This =,„ 
probsbiy o. e xp la i„.d by accusation of secr.fd prot.in 

S .«.=; s \7r ant ; which " nsit, -""v a n p Ut i.s thi::: 

lz^?' I— 8 B r incrM - tor 

an general, induction of HIV envelope by rev 
cannot have the purpose of an increased surface 
abundanc but rather of „ increased intracenuiar 

level, it is completely unclear at the moment why this 
10 should be the case. Y h " 

To test whether small subtotal elements of a gene 
are sufficient to restrict expression and render 
dependent rTHYlenv: immunoglobulin fusion proteins were 
generated, in which only about one third of the total 
15 gene had the envelope codon usage. Expression levels of 
this construct were on an intermediate level, indicating 
that the rTHY-lenv negative sequence element is not 
dominant over the immunoglobulin part. This fusion 
protein was not or only slightly rev-responsive 
20 indicating that only genes almost completely suppressed 
can be rev-responsive. 
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35 



Another characteristic feature that was found in 
the codon frequency tables is a striking 
underrepresentation of Cp G triplets, m a comparative 
study of codon usage in E. coli, yeast, drosophila and 
primates it was shown that in a high number of analyzed 
primate genes the 8 least used codons contain all codons 
with the cpc dinucleotide sequence. Avoidance of codons 
containing this dinucleotide motif was also found in the 
sequence of other retroviruses. it seems plausible that 
the reason for underrepresentation of CpG-bearing 
triplets has something to do with avoidance of gene 
silencing by methylation of CpG cytosines. The expected 
number of CpG dinucleotides for HIV as a whole is about 
one fifth that expected on the basis of the base 
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composition. This might indicate that the possibility of 
high expression is restored, and that the gene in fact 
has to be highly expressed at some point during viral 
pathogenesis. 

5 The results presented herein clearly indicate that 

codon preference has a severe effect on protein levels, 
and suggest that translational elongation is controlling 
mammalian gene expression. However, other factors may 
play ar role. First, abundance of not maximally loaded 
10 mRNA's in eukaryotic cells indicates that initiation is 
rate limiting for translation in at least some cases, 
since otherwise all transcripts would be completely 
covered by ribosomes. Furthermore, if ribosome stalling 
and subsequent mRNA degradation were the mechanism, 
15 suppression by rare codons could most likely not be 
reversed by any regulatory mechanism like the one 
presented herein. One possible explanation for the 
influence of both initiation and elongation on 
translational activity is that the rate of initiation, or 
20 access to ribosomes, is controlled in part by cues 

distributed throughout the RNA, such that the lentiviral 
codons predispose the RNA to accumulate in a pool of 
poorly initiated RNAs. However, this limitation need not 
be kinetic; for example, the choice of codons could 
2 5 influence the probability that a given translation 

product, once initiated, is properly completed. Under 
this mechanism, abundance of less favored codons would 
incur a significant cumulative probability of failure to 
complete the nascent polypeptide chain. The sequestered 
30 RNA would then be lent an improved rate of initiation by 
the action of rev. Since adenine residues are abundant 
in rev-responsive transcripts, it could be that RNA 
adenine methylation mediates this translational 
suppression. 




WO 96/09378 

PCT/US95/11511 

- 27 - 

ocedurpg 

The following procedures were used in th« *k 
*d • ^ u J n ^ne above- 





lalvfijp 



„v «, ,f e . 9U " ,Ce « nal y«« employed th. .oftwr* developed 
^^^"ityof Wisconsin computer Group. 

method. M tTf C ° nStructlons ployed the following 
methods, vector, and insert DMA was digested at a 
10 concentration or 0.5 W1 , Ml ln ^ approprUte 

restricts buffer f or l - 4 hours (tota! reaction volume 
aporoxxmat.lv 30 ,1, . D i gest ed vector was treated vlt h 

for o „ ° f 1 " 5/Bl inte " lne Phosphatase 

»» nle ^°*>°™^ -th vector and 
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..1*1 (5 ^ 10 Ml 6aCh) Were rUn °" * I-** lOW 

melting agarose gel with TAB buffer. Gel slices 
cent i i bands Qf . nterest were ^^^^ 

.1 reaction tube, melted at 65-c and directly added to 
the ligation without removal of the agarose. Ligations 

ZlTl T* d ° ne ^ 3 t0tal V01U - °< 25 Ml in 
Buffer ix Ligation Additions with 200-400 u of ligase i 

Ml of vector, and 4 M i of insert. w hen necessary 5 -' 

r d r NTP S 9in9 H endS fill6d ^ 3dding 1/10 VOlu - of 250 

^d 2-5 u of Klenow polymerase to heat 

inactivated or phenol extracted digests and incubating 

for approximately 20 min at room temperature, when 

necessary, 3' overhanging ends were filled by adding 1/10 

h V e° aTi e na°c f ti 2 ; 5 t^ ^ ° °< " ~ 

heat inactivated or phenol extracted digests, followed by 

xncubation at 37-C for 30 min. The following buffers 
were used in these reactions: iox Low buffer (60 m Tris 
HC1, P H 7.5, 60 m MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 ml* 

f-~rcapto.th.nol, 0.02* NaN 3 ) ; lOx Medium buffer (60 mM 

Tris HC1, p H 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA 
70 mM *-mercaptoethanol, 0.02* MaN 3 ) ; iox High buffer '(SO 
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mM Tris HC1, pH 7.5 ; 60 mM MgCl 2 , 50 mM NaCl r 4 mg/ml 
BSA, 70 mM 0-mercaptoethanol, 0.02% NaN 3 ) ; lOx Ligation 
additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 mM 
spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA) . 
5 Oligonucleotide synthesis and purification 

Oligonucleotides were produced on a Milligen 8750 
synthesizer (Millipore) . The columns were eluted with l 
ml of 30% ammonium hydroxide, and the eluted 
oligonucleotides were deblocked at 55°C for 6 to 12 

10 hours. After deblockiong, 150 /il of oligonucleotide were 
precipitated with lOx volume of unsaturated n-butanol in 
1.5 ml reaction tubes, followed by centrif ugation at 
15,000 rpm in a microfuge. The pellet was washed with 
70% ethanol and resuspended in 50 til of H 2 0. The 

15 concentration was determined by measuring the optical 
density at 260 nm in a dilution of 1:333 (1 OD 260 = 30 
Mg/ml) . 

The following oligonucleotides were used for 
construction of the synthetic gpl20 gene (all sequences 
20 shown in this text are in 5' to 3' direction). 

oligo 1 forward (Nhel) : cgc ggg eta gec acc gag 
aag ctg (SEQ ID NO: 1). 

oligo l: acc gag aag ctg tgg gtg acc gtg tac tac 
ggc gtg ccc gtg tgg aag ag ag gec acc acc acc ctg ttc tgc 
25 gec age gac gec aag gcg tac gae acc gag gtg cac aac gtg 
tgg gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag 
gag gtg gag etc gtg aaegtg ace gag aac ttc aac atg (SEQ 
ID NO: 2) . 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt 
30 tga agt tct c (SEQ ID NO: 3). 

oligo 2 forward: gac cga gaa ett caa eat gtg gaa 
gaa caa cat (SEQ ID NO: 4) 

oligo 2: tgg aag aac aac atg gtg gag cag atg cat 
gag gac ate ate age ctg tgg gac cag age ctg aag ccc tgc 
35 gtg aag ctg acc cc ctg tgc gtg ace tg aac tgc acc gac ctg 
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agg aac acc acc aac acc aac ac age acc gec aac . 

age aac age gag ggc acc ate a *„ aaC aac 

NO: 5) . " C at<= aag ggc 'gc gag atg (SEQ X0 

oligo 2 reverse rpsti\* ~ 
» cat etc gcc gcc ctt (S ^ 0; £ - ** *~ g tt ctt 

oliqo 3 forward (Pstl): gaa gaa cto car, «... 
cat cac cac ca g c (SEQ ID no.- 7,. * 9 C " C " 

oligo 3: aac ate acc acc aac ate „„- 
.., gag tac gcc a, g ct g 9 7t /t/r 

ate g ac aac , . 9 9at atc ^tg a gc 



10 atc 



« c gac aac gac age acc arm «./ " * cg agc 

aac acc age gto aL . 9 C9C Ctg atc tcc tgc 

age gtg ate acc cag gcc toe cec aa « 

9ag ccc ate cec atc cac tac tgc g CC ccc 9C ! ^ 

(SEQ ID NO: 8) . 9CC ggc ttc gcc 

oligo 3 reverse: oaa ct-t 
15 ggc ggg (SEQ i D No . g) Ctt 9tc W W gaa gcc 

oligo 4 forward: oco eer 
a gt ,ca ac, aca aga a g 't *7.^~ ^ 

t,c a. g u 9 ;;* ^jr^r^T 99c aa9 " c ° 9c 

™ ==9 9 t g „, agc acc 9 J **> — ca= 99° «c 

9a g g a g gag gt . ato * tC Ct « Ct9 "= 99= a,= =t g gC c 

9== aa g acc atc atc etc ^ **" ™ " = — "c 

(SEQ ID NO, u, " " 9 Mt 585 c, g atc 

» tct ko^;." - - * - 

oligo 5 forward (Mluii • 
t9» ac g c g t ccc (SEQ id no: », ' ' 9t9 "= "= 

oligo 5: aac tgc acg cgt ccc aaP «.=, 
aag cge atc cac ate ggc ccc ggg cg C g CC ttc t " "* ^ 

tct aga (SEQ id NO: 14 , . tg ° aaC atc 

oligo 5 reverse: gtc gtt cca ctt ggc tct aca aat 
gtt gca (SEQ ID NO: 15) . ga 9at 

oligo 6 forward: gca aca tct eta gag cca mar 
5 acg ac (SEQ ID NO: 16). 9 3gt gga 
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oligo 6: gcc aag tgg aac gac acc ctg cgc cag ate 
gtg age aag ctg aag gag cag ttc aag aac aag acc ate gtg 
ttc ac cag age age ggc ggc gac ccc gag ate gtg atg cac 
age ttc aac tgc ggc ggc (SEQ ID NO: 17) . 
5 oligo 6 reverse (EcoRl) : gca gta gaa gaa ttc gcc 

gcc gca gtt ga (SEQ ID NO: 18) . 

oligo 7 forward (EcoRl) : tea act gcg gcg gcg aat 
tct tct act gc (SEQ ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc 
10 ctg ttc aac age acc tgg aac ggc aac aac ace tgg aac aac 
ace acc ggc age aac aac aat att acc etc cag tgc aag ate 
aag cag ate ate aac atg tgg cay gag gtg ggc aag gcc atg 
tac gcc ccc ccc ate gag ggc cag ate egg tgc age age (SEQ 
ID NO: 20) 

15 oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc 

acc gga tct ggc cct c (SEQ ID NO: 21) . 

oligo 8 forward: ega ggg cca gat ccg gtg cag cag 
caa cat cac egg tct g (SEQ ID NO: 22). 

oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac 

20 ggc ggc aag gac acc gac acc aac gac acc gaa ate ttc cgc 
ccc ggc ggc ggc gac atg cgc gac aac tgg aga tct gag ctg 
tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gcc 
ccc acc aag gcc aag cgc cgc gtg gtg cag cgc gag aag cgc 
(SEQ ID NO: 23) . 

25 oligo 8 reverse (Notl) : cgc ggg egg ccg ctt tag 

cgc ttc teg cgc tgc acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the 
construction of the ratTHY-lenv gene. 

oligo 1 forward (BamHl/Hind3) : cgc ggg gga tec 
30 aag ctt acc atg att cca gta ata agt (SEQ ID NO: 25) . 

oligo 1: atg aat cca gta ata agt ata aca tta tta 
tta agt gta tta caa atg agt aga gga caa aga gta ata agt 
tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat ttg cca ata caa cat gaa ttt 
35 tea tta acg (SEQ ID NO: 26). 
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egt taa " (EC ° Rl/M1Ul) : ^ TO gaa ttc acg 

cgt taa tga aaa ttc atg ttg (SEQ ID NO: 27) 

gaa aaa "LT ' ^ (BamH1 / Mlul ) : cgc gga tec acg egt 
gaa aaa aaa aaa cat (SEQ id no: 28). 

5 aca t*. OU9 ° 2 '' 833 333 Cat ata tta agt gga 

tt" t t t a 9 :: 9t : cca gaa cat aca tat aga ■* * « 

ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID 

10 aca cat °\ i9 ° 2 r6VerSe (EC0R1/Sacl > : <*c gaa ttc gag etc 
aca cat ata ate tec (SEQ ID NO: 30). 

oligo 3 forward (BamHl/Sacl, : cgc gga tec gag etc 
aga gta agt gga C aa (SEQ ID NO: 31). " 

15 «. ° U9 ° ^ gta a9t 9ga caa aat cc. aca agt 

tit ::v aa : ca ata aat gta ata aga * at - ^ - 

tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 

tta tta tta tta tta agt tta agt ttt tta M . 

«.*.*. „«. 9 cr tta caa gca aca gat 

ttt ata agt tta tga (SEQ ID NO: 32) . 

20 act te 3 reV€rSe (Ec ° R1 / Notl ) : cg<= gaa ttc gcg gee 

20 get tea taa act tat aaa ate (SEQ ID NO: 33). 

ase eh*j n R ea ^j ffn 

.„„• , • Sh ° rt ' ° VerlaPPing 15 to 25 ■« oligonucleotides 
annealmg at both ends were used to amplify the long 

ol.gonuclotides by polymerase chain reaction (PGR, . 
Typical PGR conditions were: 35 cycles, 55-c annealing 
temperature, 0.2 sec extension time. PC R products were 
gel purified, phenol extracted, and used in a subsequent 

to g ene «te longer fragments consisting of two 
adjacent small fragments. These longer fragments were 
cloned into a CDM7-derived plasmid containing a leader 
sequence of the CDS surface molecule followed by a 
Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

The following solutions were used in these 
reactions: lOx PGR buffer (500 mM KC1, 100 mM Tris HC1 
PH 7.5, 8 mM Mgci 2 , 2 mM each dNTP) . The final buffer' 
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was complemented with 10% DMSO to increase fidelity of 

the Taq polymerase. 

Small scale DNA preparation 

Transformed bacteria were grown in 3 ml LB 
5 cultures for more than 6 hours or overnight. 

Approximately 1.5 ml of each culture was poured into 1.5 
ml microfuge tubes, spun for 20 seconds to pellet cells 
and resuspended in 200 fil of solution I. Subsequently 
400 fil of solution II and 300 fil of solution III were 
10 added. The microfuge tubes were capped, mixed and spun 
for > 30 sec. Supernatants were transferred into fresh 
tubes and phenol extracted once. DNA was precipitated by 
filling the tubes with isopropanol, mixing, and spinning 
in a microfuge for > 2 min. The pellets were rinsed in 
15 70 % ethanol and resuspended in 50 /xl dH20 containing 10 
Ail of RNAse A. The following media and solutions were 
used in these procedures: LB medium (1.0 % NaCl, 0.5% 
yeast extract, 1.0% trypton) ; solution I (10 mM EDTA pH 
8.0); solution II (0.2 M NaOH, 1.0% SDS) ; solution III 
20 (2.5 M KOAc, 2.5 M glacial aceatic acid); phenol (pH 
adjusted to 6.0, overlaid with TE) ; TE (10 mM Tris HC1, 
pH 7.5, 1 mM EDTA pH 8.0). 
Laroe scale DNA preparation 

One liter cultures of transformed bacteria were 
25 grown 24 to 36 hours (MC106lp3 transformed with pCDM 
derivatives) or 12 to 16 hours (MC1061 transformed with 
pUC derivatives) at 37°C in either M9 bacterial medium 
(pCDM derivatives) or LB (pUC derivatives) . Bacteria 
were spun down in 1 liter bottles using a Beckman J6 
30 centrifuge at 4,200 rpm for 20 min. The pellet was 

resuspended in 4 0 ml of solution I. Subsequently, 80 ml 
of solution II and 40 ml of solution III were added and 
the bottles were shaken semivigorously until lumps of 2 
to 3 mm size developed. The bottle was spun at 4,200 rpm 
35 for 5 min and the supernatant was poured through 
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cheesecloth into a 250 ml bottle t.™. 

to th. =ottle. Isopropanol was added 

■in th b °" le SpUn " << J °° rp» tor 10 

■in. The p.u et WM ^suspended in 4.1 ml of solulio^ i 
end added to 4.5 g of cesium chloride o 3 ml ° t T ,\ 

« 10 oo" zt: l : ' BeckMn " hi "> *~ ««riZ. 

into a . " in - The «P«"«»nt was transferred 

then !.Tr QUlC * B "T"»W tubes, which".". 

1. fixe'd ri SPUn ^ ' ^ »"»«"rifu g e using a 

T^Toand ' W " 8 °' 000 «I- f« » J.3 hours. 

The band was extracted by visible light using a 1 ml 

syring. ,nd 20 gauge needle. An eoua, volume o, dH j0 was 

with I . eXtrieted MterU1 - ™ — extracted once 
with „. b utanol saturated with 1 M sodium chloride. 

ac.ta°t W : d i y j: ddUl0n ° £ *" ^ VOlu « »' " * -onium 
acetate/ l m ^ The Mterlal 

e s t n :. p „=T e m: hi r w " tehn mi - to «- ^ 

10 Z ' ,PUn ln * B * ekMn JJ ""trifug. ,t 

> ethanolT/ 0r " ^ PeU " ~ "«~ »*T "\ 

•thanol and resuspended in 0.5 to l ml of H,o. The DMA 

density at 260 „ ln . dilutlon „ ^ (j ^ _ ^ 



The following media and buffers were used in these 
« procedures: «, bacterial .edium ,10 g M9 salts, lo , 
casamino acids (hydrolysed) , io ml M9 additions 7 5 
M9/.1 tetracycline ,500 „1 of a a5 mg/ml stoc* solution, 
".S *g/ml .mpicillin ,125 „i of a 10 m,/»l stock 
solution); H9 additions (10 mM Cacl 3 . loo »M Mgso 200 

:%1T 1 T'- ™ 9lyCW1 " ' 0.5 

-T^-Trr* l "° % trypton,; soiu "° n 1 «° - ■»* 

P 2 5 m v«. °" ( °' :! " " a ° H 1 '° % SDS " Solu "°n III 

(2.5 M KOAc 2.5 M HOAc) 
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sequencing 

Synthetic genes were sequenced by the Sanger 
dideoxynucleotide method* In brief, 2 0 to 50 fjq double- 
stranded plasmid DNA were denatured in 0.5 M NaOH for 5 
5 min. Subsequently the DNA was precipitated with 1/1.0 
volume of sodium acetate (pH 5.2) and 2 volumes of 
ethanol and centrifuged for 5 min. The pellet was washed 
with 70% ethanol and resuspended at a concentration of 1 
pq/fil. The annealing reaction was carried out with 4 M9 

10 of template DNA and 40 ng of primer in Ix annealing 
buffer in a final volume of 10 Ail. The reaction was 
heated to 65°C and slowly cooled to 37 °c. In a separate 
tube 1 Ml of 0.1 M DTT, 2 Ml of labeling mix, 0.75 Ml of 
dH 2 0, 1 Ml of [ 35 S] dATP (10 uCi) , and 0.25 m1 of 

15 Sequenase" (12 U/m1) were added for each reaction. Five 
Ml of this ...<lx were added to each annealed primer- 
template tube and incubated for 5 min at room 
temperature. For each labeling reaction 2.5 m! of each 
of the 4 termination mixes were added on a Terasaki plate 

20 and prewarmed at 37°C. At the end of the incubation 

period 3.5 m! of labeling reaction were added to each of 
the 4 termination mixes. After 5 min, 4 m1 of stop 
solution were added to each reaction and the Terasaki 
plate was incubated at 80°C for 10 min in an oven. The 

25 sequencing reactions were run on 5% denaturing 

polyacrylamide gel. An acrylamide solution was prepared 
by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 
100 g of acrylamide :bisacrylamide (29:1). 5% 
polyacrylamide 46% urea and lx TBE gel was prepared by 

30 combining 38 ml of acrylamide solution and 28 g urea. 

Polymerization was initiated by the addition of 400 Ml of 
10% ammonium peroxodisulf ate and 60 m! of TEMED. Gels 
were poured using silanized glass plates and sharktooth 
combs and run in lx TBE buffer at 60 to 100 W for 2 to 4 

35 hours (depending on the region to be read). Gels were 




10 
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transrerred to Whatman blotting paper dri , „ 

about ! h our. and exposed to x-ra y m B " *" 

^perature. Typlcally exposure 

following solutions were used in th.l ' The 

dealing bu „ er (J00 ^ H C V h :", P "« d — 

290 m Ha «)/ ^belling Mix „ . ' P \*' 100 m 
«F), Termination Mixls 80 £ .«„ C"*' dSTP -« d 

(0.9 M Tris borate, 20 M EDTA) . p * ^ " Cy!,n01 ' ' 5x 

• Po lya =r y ia m i de , 3 7^ L^^T 
TBE, 957 ml dH 2 0). iS «crylamide, 200 ml lx 




ti 



Cytoplasmic RNA was isolate 

« Phosphate transfected 293T C eU s 3S h CalCiU " 
trane^.*.- ceils 36 hours post 

culture cells m n. P-^asmic rna from tissue 

nuoiei were spun out. and S0S and protege Tv" bU "' r ' 
to ».» and 0. 2 mg/nl re S peotivel y The cyt0Dla " ^ 
extracts were in™,*.*. ^ cytoplasmic 
25 «h , incubated at 37«c for 20 min 

25 Phenol/chloroform extracted twice and . 

HNA was dissolved i„ 100 £ bu £ 1 and ™- 
«-C for 20 min. The reaction^ .to^^T " 
Ml stop buffer and precipitated again. * 25 
The following solutions were used in <-k • 
0 procedure: Lysis Buffer (TE CO n^ a 

8.0, 100 mM Naci s L h c C ° o nt " ning 50 mM Tris p H 

buffer with 10 mM ' ' % W4 ° U Buffer I (TE 

witn 10 mM MgCl 2 , 1 aM DTT o r t T /„i 

RNAse inhibitor, 0 1 v/al »« . ^ P^cental 

k /Al free OKAse I) • si- rtr , 

buffer (50 mM EDTA 1 «s m , ' op 

* tulA x »5 M NaOAc 1.0 % SDS) 
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Slot blot analysis 

For slot blot analysis 10 ^g of cytoplasmic RNA 
was dissolved in 50 fil dH 2 0 to which 150 /*1 of lOx 
SSC/18% formaldehyde were added. The solubilized RNA was 
5 then incubated at 65 °C for 15 min and spotted onto with a 
slot blot apparatus. Radioactively labelled probes of 
1.5 kb gpl20IIIb and syngpl20mn fragments were used for 
hybridization. Each of the two fragments was random 
labelled in a 50 pi reaction with 10 pi of 5x oligo- 

10 labelling buffer, 8 pi of 2.5 mg/ml BSA, 4 pi of «[ 32 P]- 
dCTP (20 uCi/fil; 6000 Ci/mmol) , and 5 U of Klenow 
fragment. After 1 to 3 hours incubation at 37 °C 100 pi 
of TE were added and unincorporated *[ 32 P]-dCTP was 
eliminated using G50 spin column. Activity was measured 

15 in a Beckman beta-counter, and equal specific activities 
were used for hybridization. Membranes were pre- 
hybridized for 2 hours and hybridized for 12 to 24 hours 
at 4 2°C with 0.5 x 10 6 cpm probe per ml hybridization 
fluid. The membrane was washed twice (5 min) with 

20 washing buffer I at room temperature, for one hour in 
washing buffer II at 65°C, and then exposed to x-ray 
film. Similar results were obtained using a 1.1 kb 
Notl/Sfil fragment of pCDM7 containing the 3 untranslated 
region. Control hybridizations were done in parallel 

25 with a random-labelled human beta-actin probe. RNA 
expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics 
phosphorimager . 

The following solutions were used in this 

3 0 procedure: 

5x Oligo-labelling buffer (250 mM Tris HC1, pH 8.0, 25 mM 
MgCl 2 , 5 mM 0-mercaptoethanol, 2 mM dATP, 2mM dGTP, 2mM 
dTTP, 1 M Hepes pH 6.6, 1 mg/ml hexanucleotides [dNTP]6); 

Hybridization Solution ( M sodium phosphate, 250 mM 

35 NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% 
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^!" id ";, 100 M9/nl denatured salmon spera dna ' »«»i» g 

buffer I (2x SSC, 

0.1% SDS); Washing buffer II <o. 5 x SSC, o.l % S DS) • 2 0x 
SSC (3 M NaCl, 0.3 M Na 3 citrate, p R adjusted to 7.0) 
5 Vaccinia renom^n^ j ?n 

Vaccinia recombination used a modification of the 
of the method described by Romeo and Seed (Romeo and 
Seed, cell, 64: 1037, 1991). Briefly, cvi cells at 70 to 
90% confluency were infected with 1 to 3 M l of a wildtype 
10 vaccinia stock WR (2 x 10« pfu/ml, for 1 hour in culture 
medium without calf serum. After 24 hours, the cells 
were transfected by calcium phosphate with 25 fig TKG 
plasmid DNA per dish. After an additional 24 to 48 hours 
the cells were scraped off the plate, spun down, and 
15 resuspended in a volume of 1 ml. A£ter 3 free2 e/ thaw 
cycles trypsin was added to 0.05 »g/ml and lysates were 
incubated for 20 min. A dilution series of io, 1 and o i 
Ml of this lysate was used to infect small dishes (6 cm) 
of cvi cells, that had been pretreated with 12.5 M g/ml 
20 mycophenolic acid, 0.25 mg/ml xanthin and 1.36 mg/ml 
hypoxanthine for 6 hours. Infected cells were cultured 
for 2 to 3 days, and subsequently stained with the 
monoclonal antibody NEA9301 against g P i 20 and an alkaline 
phosphatase conjugated secondary antibody. cells were 
25 incubated with 0.33 mg/ml NBT and 0.16 mg/ml BCIP in AP- 
buffer and finally overlaid with 1% agarose in PBS. 
Positive plaques were picked and resuspended in loo M l 
Tris P H 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly 
>0 scaled up. Finally, one large plate of Hela cells was 
infected with half of the virus of the previous round. 
Infected cells were detached in 3 ml of PBS, lysed with a 
Dounce homogenizer and cleared from larger debris by 
centrifugation. VPE-8 recombinant vaccinia stocks were 
5 kindly provided by the AIDS repository, Rockvilie, MD, 
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and express HIV-l IIIB gpl20 under the 7.5 mixed 
early/late promoter (Earl et al., J. Virol, , 65:31, 
1991) . In all experiments with recombinant vaccina cells 
were infected at a multiplicity of infection of at least 
5 10. 

The following solution was used in this procedure: 
AP buffer (100 mM Tris HC1, pH 9.5, 100 mM NaCl, 5 mM 
MgCl 2 ) 

Cell culture 

10 The monkey kidney carcinoma cell lines CVl and 

Cos7, the human kidney carcinoma cell line 293T, and the 
human cervix carcinoma cell line Hela were obtained from 
the American Tissue Typing Collection and were maintained 
in supplemented IMDM. They were kept on 10 cm tissue 

15 culture plates and typically split 1:5 to 1:20 every 3 to 
4 days. The following medium was used in this 

procedure: 

Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 
10% calf serum, iron-complemented, heat inactivated 30 
20 min 56°C, 0.3 mg/ml L-glutamine, 25 /xg/ml gentamycin 0.5 
mM 0-mercaptoethanol (pH adjusted with 5 M NaOH, 0.5 
ml)). 

Transfection 

Calcium phosphate transfection of 293T cells was 
25 performed by slowly adding and under vortexing 10 fig 

plasmid DNA in 250 Ail 0.25 M CaCl 2 to the same volume of 
2x HEBS buffer while vortexing. After incubation for 10 
to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In 
3 0 cotransf ection experiments with rev, cells were 
transfected with 10 nq gpl20IIIb, gpl20lllbrre, 
syngpl20mnrre or rTHY-lenveglrre and 10 nq of pCMVrev or 
CDM7 plasmid DNA. 
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The following solutions were used in this 

s^Tmt* T bU " er (28 ° ™ NaC1 ' 10 - KC1 ' >•> ■* 

sterile filtered); 0.25 mM CaCl 2 (autoclaved) . 
Immunop r? r ;i P i^ .,t.^ 7r| 

5 After 48 to 60 hours medium was exchanged and 

cells were incubated for additional 12 hours in Cys/Met- 
free medium containing 200 „ci of 35 S-translabel 
Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease 
10 inhibitors leupeptin, aprotinin and pmsf to 2.5 5 0 
Mg/-1. 100 Mml respectively, i ml of supernatant was 
incubated with either 10 ,1 of packed protein A sephaLe 
alone (rTHV-ienvegirre, or with protein A sepharose 
gof a purified CD4 / immunoglobulin fusion protein 
15 (kindly provided by Behring, (all gpl20 constructs, at 
4 C for 12 hours on a rotator. Subsequently the protein 
A beads were washed 5 times for 5 to 15 min each time. 

Ts'LT* fiMl 10 ^ ° f l0adin9 bU " er -ntaining 

20 7%, T SamPlSS W6re b ° il€d f ° r 3 Bin and WH- on 
7% (an gpl20 constructs) or 1Q% (rTHY . lenveglrre) SDs 

polyacrylamide gels (TRIS pH 8.8 buffer in the resolving 
TRIS P H 6.8 buffer in the stacking gel, TRIS-glycin 
running buffer, Maniatis et al. 1989). Gels were fixed 
in 10% acetic acid and 10 % methanol, incubated with 
25 Amplify for 20 min, dried and exposed for 12 hours. 

The following buffers and solutions were used in 
this procedure: Wash buffer (100 mM Tris, pH 7.5, 150 mM 
Naci, 5 mM Caci 2 , i% NP-40) ; 5x Running Buffer (125 mM 
Tns, l. 2 5 M Glycin, 0.5% SOS) ; Loading buffer (10 % 

30 MueT" 1 ' ^ SDS ' 4% ^ merCapt ° ethano1 ' % bromphenol 

Inununoff jiir» re3cer|r!«» 

293T cells were transfected by calcium phosphate 

coprecipitation and analyzed for surface THY-l expression 
35 after 3 days. After detachment with 1 mM EDTA/PBS, cells 
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were stained with the monoclonal antibody ox-7 in a 
dilution of 1:250 at 4<>C for 20 min, washed with PBS and 
subsequently incubated with a 1:500 dilution of a FITC- 
conjugated goat anti-mouse immunoglobulin antiserum. 
5 Cells were washed again, resuspended in 0,5 ml of a 
fixing solution, and analyzed on a EPICS XL 
cytof luorometer (Coulter) . 

The following solutions were used in this 
procedure: 

10 PBS (137 mM NaCl, 2.7 mM KC1, 4.3 mM Na 2 HP0 4 , 1.4 mM 
KH 2 P0 4 , P H adjusted to 7.4); Fixing solution (2% 
formaldehyde in PBS) . 
ELISA 

The concentration of gpl20 in culture supernatants 

15 was determined using CD4 -coated ELISA plates and goat 
anti-gpl20 antisera in the soluble phase. Supernatants 
of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to 
remove debris and incubated for 12 hours at 4«C on the 

20 plates. After 6 washes with PBS 100 jil of goat anti- 

gpl20 antisera diluted 1:200 were added for 2 hours. The 
plates were washed again and incubated for 2 hours with a 
peroxidase-conjugated rabbit anti-goat IgG antiserum 
1:1000. Subsequently the plates were washed and 

25 incubated for 30 min with 100 Ml of substrate solution 
containing 2 mg/ml o-phenylenediamine in sodium citrate 
buffer. The reaction was finally stopped with 100 jil of 
4 M sulfuric acid. Plates were read at 490 nm with a 
Coulter microplate reader. Purified recombinant 

30 gp!20IIIb was used as a control. The following buffers 
and solutions were used in this procedure: Wash buffer 
(0.1% NP40 in PBS); Substrate solution (2 mg/ml o- 
phenylenediamine in sodium citrate buffer) . 
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Use. 

The synthetic genes of the invention 
for expressing the » „,.««.-• lnv «ntion are useful 

»ing rne a protein normally exDr.«^ • 

kalian cells in cell culture (e.a IT ln 
* Production of human proteins such as hGH IITT^ 
VII, and Factor xx,. The synthetic genes of ihe * 
mention are also useful for gene therapy ' 



WO 96/09378 



PCTAJS95/11511 



42 - 



SEQUENCE LISTING 



(1) GENERAL INFORMATION : 

(i) APPLICANT: SEED, BRIAN 

i 

<ii) TITLE OF INVENTION: OVER5XPRESSION OF MAMMALIAN AND VIRAL ! 

PROTEINS j 

j 

(ill) NUMBER OF SEQUENCES: 37 j 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE : Fish Richardson 
(8) STREET: 225 Franklin Street 

(C) CITY: Boston 

<D) STATE: Massachusetts 

(E) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release /1.0, Version #1.30B 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/308,286 

(B) FILING DATE: 19-SEP-1994 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: CLARK, PAUL T 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE/ DOCKET NUMBER: 00786/226001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: <617) 542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CGCGGGCTAG CCACCGAGAA GCTG 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 196 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
ACCGAGAACC TGTGCGTCAC CGTGTACTAC CCCCTCCCCC TGTGGAAGAG AGGCCACCAC 
CACCCTCTTC TGCCCCAGCG ACGCCAAGCC CTACGACACC GAGGTGCACA ACGTGTGGGC 
CACCCAGCCG TGCGTGCCCA CCGACCCCAA CCCCCAGGAG GTGGAGCTCG TGAACGTGAC 
CGACAACTTC AACATG 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



60 
120 
180 
196 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCACCATCTT GTTCTTCCAC ATGTTGAAGT TCTC 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 33 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



34 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GACCCAGAAC TTCAACATCT GCAAGAACAA CAT 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

TGCAAGAACA ACATGGTGGA CCACATGCAT CAGCACATCA TCACCCTCTG GCACCAGACC 60 

CTCAACCCCT CCGTCAACCT CACCCCCTCT GCGTGACCTC AACTCCACCC ACCTCACCAA 120 

CACCACCAAC ACCAACACAC CACCGCCAAC AACAACACCA ACAGCCACCG CACCATCAAG 180 
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GCCGGCGAGA TO I92 
(2) INFORMATION FOR SEQ ID NOt6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS : single 

(D) TOPOLOGY 2 linear 



(Xi) SEQUENCE DESCRIPTION x SEQ ID NO: 6s 
CTTGAACCTG CAGTTCTTCA TCTCGCCGCC CTT 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAAGAACTGC AGCTTCAACA TCACCACCAG C 

(2) INFORMATION FOR SEQ ID NOs8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 195 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AACATCACCA CCAGCATCCG CGACAACATG CAGAAGGAGT ACGCCCTGCT GTACAACCTG 60 

GATATCGTGA GCATCGACAA CGACAGCACC ACCTACCGCC TGATCTCCTG CAACACCAGC 120 

GTGATCACCC AGGCCTGCCC CAAGATCAGC TTCGAGCCCA TCCCCATCCA CTACTGCGCC 180 

CCCGCCGGCT TCGCC 195 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) {•SQOENCE DESCRIPTION: SEQ ID NOz9: 
CAACTTCTTO. TCCOCCCCCA ACCCCCCOCC 

(2) INFORMATION FOR SEQ ID NO: 10: 30 
U) SEQUENCE CHARACTERISTICS • 

(E* TYPE: nucleic acid 
<£): STRANDEDNESS: single 
(K)< TOPOLOCY: linear 



(xi, SEQUENCE DESCRIPTION: SEQ ID NOtlO: 
CCGCCCCCCC CCCCTTCCCC ATCCTGAACT CCAACCACAA CAACTTC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS i 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: ainole 

(D) TOPOLOGY: linear 



47 



60 
120 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCCCACAAGA AGTTCAGCGG CAAGGGCAGC TGCAAGAACG TCAGCACCGT GCAGTGCACC 
CACCGCATCC GGCCGCTGGT CAGCACCCAG CTCCTGCTGA ACGCCAGCCT GGCCGAGGAG 

GAGCTGGTCA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT CCTCCACCTC 180 

AATGAGACCC TGCAGATC 

198 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH : 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS i einale 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
ACTTGCCACC CGTGCAGTTC ATCTGCACGC TCTC 

34 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GACACCGTGC ACATCAACTC CACGCGTCCC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 14: 
AACTCCACCC GTCCCAACTA CAACAAGCGC AAGCCCATCC ACATCGGCCC CGCGCGCGCC 
TTCTACACCA CCAAGAACAT CATCGGCACC ATCCTCCACG CCCACTGCAA CATCTCTAGA 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTCGTTCCAC TTGCCTCTAG AGATGTTGCA 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GCAACATCTC TAGAGCCAAG TGGAACGAC 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 131 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(Xi, SEQUENCE DESCRIPTION: SEQ ID NO:17 : 



CCCAAGTGGA ACGACACCCT OCCCCACATC ^ 

-ACCATCC TCTTCACCAG ACCACCCCCC CCCACCCCCA OAT^TC " 
ACTGCGGCGG C «ATCCTCATC CACACCTTCA 



60 
120 



(2) INFORMATION FOR SEQ ID NO,18, 131 
(i) SEQUENCE CHARACTERISTICS, 

(B) TYPE: nucleic acid 
(C STRANDEDNESS : .ingle 
(0) TOPOLOGY: linear 



29 



29 



(Xi) SEQUENCE DESCRIPTION : SEQ ID Nc , 18: 
GCAGTAGAAG AATTCCCCGC CGCAGTTCA 
(2) INFORMATION FOR SEQ ID NO:19: 

U) SEQUENCE CHARACTERISTICS • 
(A) LENGTH : 29 ba.e Mi r . 
(8> TYPE, nueleS IcS 

(0) TOPOLOGY, linear 

(Xi, SEQUENCE DESCRIPTION, SEQ ID NO,19: 
TCAACTCCCC CGGCGAATTC TTCTACTGC 
(2, INFORMATION FOR SEQ ID NO,20: 

(i) SEQUENCE CHARACTERISTICS- 

(A, LENGTH, 195 bill pfi r8 

(B) TYPE, nucleic acid 

(C) STRANDEDNESS, Binole 
(O) TOPOLOGY, linear 



(xi, SEQUENCE DESCRIPTION: SEQ ID MO. 20, 

CCCCAATTCT TCTACTCCAA CACCACCCCC CTGTTCAACA ee*,—- 
ACCTOZAm CTGTTCAACA CCACCTCGAA CCCCAACAAC 

ACCTGGAACA ACACCACCGG CAGCAACAAC AATATTACCC TCCACTCCAA GATCAAGCAG 12Q , 

A =CA TGTGGCAGGA GGTGGGCAAG GCCATGTACG CCCCCCCCAT eo 
ATCCCCTCCA GCAGC 180 

(2) INFORMATION FOR SEQ ID NO, 21, " 5 

(1) SEQUENCE CHARACTERISTICS, 



60 | 

! 
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(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS x single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
GCAGACCGGT GATGTTGCTG CTGCACCGGA TCTGGCCCTC 40 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGAGGGCCAG ATCCGGTGCA GCAGCAACAT CACCGCTCTG 40 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AACATCACCG GTCTGCTGCT GCTCCTGACC CGGACGGCGG CAAGGACACC GACACCAACG 60 

ACACCGAAAT CTTCCGCGAC GGCGGCAAGG ACACCAACGA CACCGAAATC TTCCGCCCCG 120 

GCGGCGGCGA CATGCGCGAC AACTGGAGAT CTGAGCTGTA CAAGTACAAG GTGGTGACGA 180 

TCCACCCCCT GGCCGTGGCC CCCACCAAGG CCAAGCCCGC GGTGGTGCAG CGCGACAAGC 240 

GC 242 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
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<xi> SEQUENCE DESCRIPTION: SEQ 10 NO: 24 • 
CCCCGGCGCC CQCTTTACCO CTTCTCGCCC TCCACCAC 

(2) INFORMATION FOR SEQ 10 NO:2S, 38 
U> SEQUENCE CHARACTERISTICS ti 

(8) TYPE: nucleic acid 

, ( n! flR^EOMBSS: .ingle 
(0) TOPOLOGY, linear 



<*i) SEQUENCE DESCRIPTION, SEQ ID NO:25, 
CCCCCCCCAT CCAACCTTAC CATCATTCCA CTAATAAGT 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

A LENGTH: 165 ba.e pfL. 
(8) TYPE: nucleic acid 

In! fI RANDBDNESS « ■Ingl* 
(D) TOPOLOGY: linear 



39 



(Xi) SEQUENCE DESCRIPTION, SEQ ID NO.26- 

TAATAACTAT TACAGGACAA 60 

AGAGTAATAA GTTTAACAGC ATCTTTAGTA AATCAAAATT TGAGATTAGA TTGT^ 

O.AAATAATA CAAATTTGCC AATACAACAT GAATTTTCAT TAACC 

(2) INFORMATION FOR SEQ ID NO:27: "* 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 baa* pair. 

(B) TYPE: nucleic acid 

(C) STRANDEONESS : .ingle 
<D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CCCGCCCAAT TCACCCCTTA ATCAAAATTC ATGTTG 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 ba.e pair. 
(8) TYPE: nucleic acid 
<C) STRANDEONESS: .ingle 
(D) TOPOLOGY: linear 



36 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



CCCCGATCCA CGCCTCAAAA AAAAAAACAT 



30 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 149 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CGTGAAAAAA AAAAACATCT ATTAAGTGGA ACATTAGGAG TACCAGAACA TACATATAGA 60 
AGTAGAGTAA TTTGTTTAGT GATAGATTCA TAAAAGTATT AACATTAGCA AATTTTACAA 120 
CAAAAGATGA AGGAGATTAT ATGTCTGAG 149 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CGCG AATTCG AGCTCACACA TATAATCTCC 30 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CCCGGATCCG AGCTCAGAGT AAGTGGACAA 30 
(2) INFORMATION FOR SEQ ID NOi32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
CTCAGAGTAA CTGCACAAAA TCCAACAACT AGTAATAAAA CAATAAATGT AATAAGAGAT 60 
AAATTAGTAA AATGTGAGGA ATAAGTTTAT TAGTACAAAA TACAAGTTGG TTATTATTAT 12e 
TATTATTAAG TTTAAGTTTT TTACAAGCAA CAGATTTTAT AAGTTTATGA 
(2) INFORMATION FOR SEQ ID NOi33i 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 36 base pairs 

(B) TYPE i nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CCCCAATTCG CGGCCGCTTC ATAAACTTAT AAAATC 
(2) INFORMATION FOR SEQ ID NO:34i 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 1632 base paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
CTCGAGATCC ATTGTGCTCT AAACGAGATA CCCGGCCAGA CACCCTCACC TGCGGTGCCC 
AGCTCCCCAG CCTGAGGCAA GAGAAGCCCA GAAACCATGC CCATGGGGTC TCTGCAACCG 
CTCGCCACCT TGTACCTGCT GGGGATGCTG GTCGCTTCCG TGCTACCCAC CCAGAAGCTG 
TGGCTGACCG TCTACTACCC CCTGCCCGTG TGGAAOCAGC CCACCACCAC CCTCTTCTCC 
CCCAGCGACG CCAAGGCGTA CGACACCGAG GTGCACAACG TGTGCGCCAC CCAGGCGTGC 
GTCCCCACCG ACCCCAACCC CCACGAGCTG GACCTCGTGA ACGTCACCGA GAACTTCAAC 
ATGTGGAAGA ACAACATGGT GGAGCAGATG CATGAGGACA TCATCAGCCT GTGGGACCAC 
AGCCTGAAGC CCTGCGTGAA GCTGACCCCC CTGTGCCTGA CCCTGAACTG CACCGACCTG 
AGGAACACCA CCAACACCAA CAACAGCACC GCCAACAACA ACAGCAACAG CGAGGGCACC 
ATCAAGGCCG CCCAGATGAA CAACTCCACC TTCAACATCA CCACCAGCAT CCCCCACAAC 
ATGCAGAAGG AGTACCCCCT GCTGTACAAG CTGGATATCC TGAGCATCGA CAACCACACC 
ACCAGCTACC GCCTGATCTC CTCCAACACC AGCCTGATCA CCCAGGCCTC GCCCAACATC 
AGCTTCGAGC CCATCCCCAT CCACTACTGC GCCCCCGCCG GCTTCGCCAT CCTGAAGTGC 
AACGACAACA AGTTCAGCGG CAACGGCACC TGCAAGAACC TGAGCACCCT GCAGTGCACC 



170 



36 



60 
120 
180 
240 
300 
360 
420 
480 
S40 
600 
660 
720 
780 
840 





WO 96709378 PCIYUS95/11511 



- 52 - 

CACGGCATCC GCCCCGTGCT GACCACCCAG CTCCTGCTCA ACGGCAGCCT GGCCGAGGAG 900 

GAGGTGGTGA TCCGCAGCGA GAACTTCACC GACAACGCCA AGACCATCAT CGTGCACCTG 960 

AATCftGAGCG TGCAGATCAA CTGCACGCGT CCCAACTACA ACAAGCGCAA GCGCATCCAC 1020 

ATCGGCCCC& GGCCCGCCTT CTACACCACC AAGAACATCA TCCGCACCAT CCCCCAGGCC 1080 

CACTGCAAC& TCTCTAGAGC CAAGTGGAAC GACACCCTGC GCCAGATCCT GAGCAAGCTG 1140 

AAGGACCAGT TCAAGAACAA GACCATCGTG TTCAACCAGA GCAGCGGCGC CGACCCCGAG 1200 

ATCGTGATCC ACACCTTCAA CTCCCGCGGC GAATTCTTCT ACTCCAACAC CAGCCCCCTG 1260 

TTCAACAGCA CCTCGAACGG CAACAACACC TGGAACAACA CCACCCGCAG CAACAACAAT 1320 

ATTACCCTCC AGTGCAAGAT CAAGCAGATC ATCAACATGT GGCAGGAGCT GGGCAAGGCC 1380 

ATGTACGCCC CCCCCATCGA GGGCCAGATC CGGTGCAGCA GCAACATCAC CGCTCTCC73 1440 

CTGACCCGCG ACGGCGGCAA GGACACCGAC ACCAACGACA CCGAAATCTT CCGCCCCGGC 1500 

CCCGGCGACA TGCGCGACAA CTGGAGATCT GAGCTGXACA AGTACAAGGT GGTGACGATC 1560 

GAGCCCCTGG GCCTGGCCCC CACCAACGCC AAGCGCCGCG TGGTGCACCG CCAGAAGCGC 1620 

TAAAGCGGCC CC 1632 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2481 baa© pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 



ACCGAGAAGC 


TGTGGGTGAC 


CGTGTACTAC 


GGCGTGCCCG 


TCTGGAAGGA 


GGCCACCACC 


60 


ACCCTGTTCT 


GCGCCAGCGA 


CGCCAAGGCG 


TACGACACCG 


AGGTGCACAA 


CGTGTGGCCC 


120 


ACCCAGCCGT 


CCGTCCCCAC 


CGACCCCAAC 


CCCCAGGAGG 


TGCAGCTCCT 


GAACGTGACC 


180 


GAGAACTTCA 


ACATGTGGAA 


CAACAACATG 


CTGGAGCAGA 


TGCATGACGA 


CATCATCAGC 


240 


CTGTGGGACC 


AGAGCCTGAA 


GCCCTGCGTG 


AAGCTGACCC 


CCCTGTGCGT 


GACCCTGAAC 


300 


TGCACCGACC 


TGAGGAACAC 


CACCAACACC 


AACAACAGCA 


CCGCCAACAA 


CAACAGCAAC 


360 


AGCGAGGGCA 


CCATCAAGGG 


CGGCGAGATG 


AAGAACTGCA 


CCTTCAACAT 


CACCACCAGC 


420 


ATCCGCGACA 


AGATGCAGAA 


GGAGTACGCC 


CTGCTGTACA 


AGCTGGATAT 


CGTGAGCATC 


480 


CACAACGACA 


GCACCAGCTA 


CCGCCTGATC 


TCCTGCAACA 


CCAGCGTGAT 


CACCCAGGCC 


540 


TGCCCCAAGA 


TCAGCTTCGA 


GCCCATCCCC 


ATCCACTACT 


GCGCCCCCGC 


CCGCTTCGCC 


600 


ATCCTGAAGT 


GCAACGACAA 


GAACTTCAGC 


GGCAAGGGCA 


GCTCCAAGAA 


CCTGACCACC 


660 
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GTCCAGTGCA CCCACGCCAT CCGGCCGGTG 
CTGGCCCAGG AGCAGCTGGT GATCCGCAGC 
ATOCTGCACC TCAATGACAG CCTCCACATC 
AAGCGCATCC ACATCGGCCC CGCGCGCGCC 
ATCOGCCACC CCCACTGCAA CATCTCTAGA 
CTGAGCAACC TGAACGAGCA GTTCAAGAAC 
CGCCACCCCG AGATCGTGAT GCACAGCTTC 
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GTGAGCACCC AGCTCCTGCT GAACCGCAGC 
GAGAACTTCA CCGACAACGC CAAGACCATC 
AACTGCACGC GTCCCAACTA CAACAAGCGC 
TTCTACACCA CCAAGAACAT CATCCCCACC 
GCCAAGTGGA ACGACACCCT CCCCCACATC 
AAGACCATCG TGTTCAACCA CAGCACCGCC 
AACTGCGCCG CCCAATTCTT CTACTGCAAC 



ACCACCCCCC TCTTCAACAG CACCTGCAAC GGCAACAACA CCTCGAACAA CAGCACCGCC 
AGCAACAACA ATATTACCCT CCAGTCCAAC ATCAAGCAGA TCATCAACAT GTGGCAGGAG 
CTGCCCAAGC CCATCTACCC CCCCCCCATC GAGGGCCAGA TCCGGTGCAG CAGCAACATC 
ACCGCTCTCC TGCTCACCCG CGACCGCCCC AAGGACACCG ACACCAACCA CACCCAAATC 
TTCCCCCCCG GCGGCGGCGA CATGCGCCAC AACTGGAGAT CTGAGCTGTA CAAGTACAAG 
CTGCTCACGA TCGACCCCCT GGGCGTGGCC CCCACCAACG CCAACCCCCG CGTCGTCCAC 
CCCCACAAGC CGCCCCCCAT CGGCGCCCTG TTCCTCGCCT TCCTGCCCCC GGCCCGCAGC 
ACCATGGGGG CCGCCACCGT GACCCTGACC CTGCAGGCCC GCCTGCTCCT CACCGGCATC 
GTGCAGCAGC AGAACAACCT CCTCCGCCCC ATCGAGCCCC ACCACCATAT GCTCCACCTC 
ACCCTCTCGG CCATCAAGCA CCTCCAGGCC CCCCTCCTGC CCGTGGACCC CTACCTCAAC 
CACCAGCACC TCCTCGCCTT CTGGGGCTGC TCCCGCAAGC TGATCTGCAC CACCACCGTA 
CCCTCGAACG CCTCCTGGAG CAACAACAGC CTGCACCACA TCTGCAACAA CATCACCTCG 
ATGCAGTGCG AGCGCGAGAT CGATAACTAC ACCAGCCTCA TCTACAGCCT GCTGGAGAAG 
AGCCAGACCC AGCAGGAGAA GAACGAGCAG CAGCTCCTGC ACCTGGACAA CTGGCCGACC 
CTCTCCAACT CCTTCCACAT CACCAACTCG CTCTCCTACA TCAAAATCTT CATCATCATT 
CTGCGCGGCC TCGTGGCCCT CCGCATCGTG TTCGCCGTCC TCACCATCGT CAACCCCCTC 
CCCCAGCCCT ACAGCCCCCT CACCCTCCAG ACCCCCCCCC CCCTGCCGCC CGGGCCCCAC 
CGCCCCGAGG GCATCCACCA GCACGCCGGC GACCGCCACC GCGACACCAG CCCCAGGCTC 
CTGCACCCCT TCCTGGCGAT CATCTGGGTC GACCTCCGCA CCCTGTTCCT CTTCACCTAC 
CAGCACCGCC ACCTGCTGCT GATCGCCGCC CCCATCCTGC AACTCCTAGG CCCCCGCGCC 
TCCCAGGTGC TGAAGIACTG GTGGAACCTC CTCCAGTATT GGAGCCAGGA GCTGAAGTCC 
AGCCCCCTCA CCCTGCTGAA CCCCACCGCC ATCGCCCTGG CCGACGGCAC CGACCGCCTG 
ATCGACGTCC TCCAGACCCC CCGCAGCGCG ATCCTGCACA TCCCCACCCG CATCCCCCAC 
CCCCTCCACA CCGCGCTGCT C 



720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2481 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 486 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EONESS : single 
(0), TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

ATGAATCCAG TAATAAGTAT AACATTATTA TTAAGTGTAT TACAAATGAC TAGAGGACAA 60 

AGAGTAATAA GTTTAACAGC ATCTTTACTA AATCAAAATT TGAGATTAGA TTGTAGACAT 120 

GAAAATAATA CACCTTTGCC AATACAACAT GAATTTTCAT TAACGCGTGA AAAAAAAAAA 180 

CATGTATTAA GTGGAACATT AGGAGTACCA GAACATACAT ATAGAAGTAG AGTAAATTTC 240 

TTTAGTGATA GATTCATAAA AGTATTAACA TTAGCAAATT TTACAACAAA AGATGAAGGA 300 

CATTATATCT GTGACCTCAG ACTAAGTGGA CAAAATCCAA CAACTAGTAA TAAAACAATA 360 

AATGTAATAA GAGATAAATT AGTAAAATCT GCAGGAATAA CTTTATTAGT ACAAAATACA 420 

AGTTGGTTAT TATTATTATT ATTAAGTTTA AGTTTTTTAC AAGCAACAGA TTTTATAAGT 480 

TTATGA 455 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 485 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

ATGAACCCAG TCATCAGCAT CACTCTCCTO CTTTCAGTCT TGCAGATGTC CCGAGGACAG 60 

AGGGTGATCA GCCTGACAGC CTGCCTCGTC AACAGAACCT TCCACTGGAC TGCCGTCATG 120 

AGAATAACAC CAACTTGCCC ATCCAGCATG AGTTCAGCCT GACCCGAGAG AAGAAGAAGC 180 

ACGTGCTGTC AGGCACCCTG GGGGTTCCCG AGCACACTTA CCGCTCCCGC GTCAACCTTT 240 

TCAGTGACCG CTTTATCAAG GTCCTTACTC TAGCCAACTT GACCACCAAG GATGAGGGCG 300 

ACTACATGTG TCAACTTCGA GTCTCGGGCC AGAATCCCAC AAGCTCCAAT AAAACTATCA 360 

ATGTGATCAG AGACAAGCTG GTCAAGTGTG GTGGCATAAG CCTGCTGGTT CAAAACACTT 420 

CCTGGCTGCT CCTGCTCCTC CTTTCCCTCT CCTTCCTCCA AGCCACGGAC TTCATTTCTC 480 

TGTGA 485 

What is claimed is: 
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.»r..s.V , * * ynthetic «•"• • n =°<"<>° . protein normally 
expressed i„ mammalian cells wherein at least ™ 

. p "::r °r iess preferred ~- - " « r,.r 

P^erreY ! MaMU " Pr ° tein «~» 
preferred codon encoding the same amino acid. 

svnth«, 2 ' ^ Synthetlc 9« ne <* 1 wherein said 

eynthetic g . n . i, C a P a b i. of .^.^ 

protein at a level which is at least no, of that 
expressed by said natural gen. in an in y^ mammalian 
cell culture system under identical conditions. 

svnth.,, 3 ' T,M> Synth,tlc «* =lai" 1 wherein said 

synthetic ,en. is capable or expressing said mammalian 
Protein at a level w hle h i, , t lea s t 15M o£ 



15 



vitro 



culture system under identical conditions. 

svnth«, 4 ' ^ Synthetic *"» ° f <*«1» 1 wherein said 
synthetic gen. ls capable of expressing said mammalian 
protein at a level which i, at least 20 0% of that 

expressed hv ^ at 



20 



vitro 



5 



) 



culture system under identical conditions. 

svnth e „ 5 ' SynthetiC 9ene of cla ** 1 wherein said 

synt hetlc gene lB capafcle of express . ng sa . d 

protean at a level which is at least 500% of that 
expressed by said natural gene in an ^ ^ cell 
culture system under identical conditions. 

a vn*.H J' SynthStic * ene of claim 1 wherein said 

synthetic gene is capable of expressing said mammalian 
protexn at a level which is at least ten times that 
expressed by said natural gene in an ^ v^ cell 
culture system under identical conditions. 
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7. The synthetic gene of claim 1 wherein at least 
10% of the codons in said natural gene are non-preferred 
codons . 



8. The synthetic gene of claim l wherein at least 
5 50% of the codons in said natural gene are non-preferred 
codons . 



9. The synthetic gene of claim 1 wherein at least 
50% of the non-preferred codons and less preferred codons 
present in said natural gene have been replaced by 

10 preferred codons. 

10. The synthetic gene of claim 1 wherein at 
least 90% of the non-preferred codons and less preferred 
codons present in said natural gene have been replaced by 
preferred codons. 

15 11. The synthetic gene of claim 1 wherein said 

protein is a retroviral or lentiviral protein. 

12. The synthetic gene of claim 11 wherein said 
protein is an HIV protein. 

13. The synthetic gene of claim 12 wherein said 
20 protein is selected from the group consisting of gag, 

pol, and env. 

14. The synthetic gene of claim 13 wherein said 
protein is gpl20 or gpl60. 

15. The synthetic gene of claim 1 wherein said 
25 protein is a human protein. 
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16. A method for preparing a synthetic gene 
encoding a protein normally expressed by mammalian cells, 
comprising identifying non-preferred and less-preferred 
codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less- 
preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 
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I C7G3AGATCC AT7C75C7C7 AAACGAGA7A CCCGCCCAGA CACCC7CACC 
51 73CGG7CCCC A.3C7GC3CAG GC7GAGGCAA GAGAAGGCCA GAAACCA7CC 

ccatggggtc rrrscAAccs ctggccacc? tgtacctgct ggggatcctg 

ISl S7C5C77CC3 7-3C7ASC3AC C3ACAAGC75 TGGGTGACCG 737AC7ACGG 
201 C37GCC3373 73GAAGGAGG CCACCACCAC CC7G77C7GC SCCAGCGACG 
251 C3AACGC37A C3ACACC3AG G73CACAAC3 737GGGCCAC CCACGC37GC 
3:i 373CC3A3CC ACCCCAACCC CCAGGAGG7G GAGC73373A AC373ACC3A 
2 51 3AAC77CAAC A73733AAGA ACAACA7GS7 35AGCAGA7G CA7GAGCACA 
' :i "=A*CA3CC? &7GGGACCA5 AGCC73AAGC C773C57GAA GC7GACCCC3 
<S1 37373C37GA C r~3AAC73 CACC3ACC73 AGCAACACCA CCAACACCAA 
CAACAGCACC WCAACAACA ACAGCAACAG C3AGGSCACC A7CAAGGGCC 
3CGAGA7GAA CAAC7GCAGC 77CAACA7CA CCACCAGCA7 CC3CGACAAG 
601 A7GCAGAAGC A37AC3C3C7 GC7G7ACAAG C73GATA7C3 73AGCA7CGA 
S51 CAACGACACC ACCAGC7ACC GCC7GA7C7C C7GCAACACC AGCC7GA7CA 
C3CAGGCC73 qCCCAAGATC AGC77CCAGC C3A7CCCCAT CCAC7ACTCC 
~S; 3CCCCC3CC3 ^C77C=CCA7 C37GAAG7GC AACGACAAGA AC7TCAGCCC 
301 CAAGGGCACC T3CAAGAACG 73AGCACCG7 GCAG73CACC CACGGCA7CC 
551 3GCC3G7GC7 GAGCACC3AG C7C77SC7GA ACGGCAGCC7 3GCC3AGGAG 
9C1 3AGG7GG7CA TCCSCAGCGA GAAC77CACC 3ACAAC3CCA AGACCA7CA? 
351 C37CCACC7G AA7GAGAGCG 7GCAGA7CAA C73CACGCS7 CGCAACTACA 
13C1 ACAA3C3CAA <*CGCA7CCAC A7CCGCCCC3 3GCGC3CC77 C7ACACCACC 
1C51 AAGAACA7CA TCGGCACCAT CC3CCAGGCC CACTGCAACA 7C7C7AGAGC 
1101 CAAG7GGAAC CACACCC73C GCCAGA7C37 3AGCAAGC73 AAGGAGCAG7 
1151 7CAAGAACAA CACCA7CG7G 77CAACCAGA GCAGCGGCGG CCACCCCGAC 
1201 A7CG73A7GC ACAGC77CAA C7GCGGCGGC GAA77C77C7 AC73CAACAC 
1251 CAGCCC3773 TTCAACAGCA CC7GGAAC=G CAACAACACC 72SAACAACA 
1301 CCACCGGCAC CAACAACAA7 A7TACC37CC AG7GCAAGA7 CAAGCAGA7C 
13S1 A7CAACA7G7 GGCAG3AGG7 GGGCAAGGCC A7C7AC3CCC CCCCCATCGA 
KOI GGGCCAGATC CGG7GCACCA CCAACA7CAC CGG7C7CC7G C7CACCCGCG ^"'^ ' 
1451 ACGCCCGCAA CGACACCCAC ACCAACGACA CCGAAA7C7T CCSCCCCGGC ( 5H6ST 1 OF 
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IS 01 3GCCGCSACA T3CGC3ACAA CTSSAGA'C "A 

. . B , „ « 4 ACA AG7ACAAGG7 

GwTGACCATC ^GCCCCTGG GC3TGGCCCC -AC=^r . 

^C^AAGGCC AAGCGCSSCC 

Is 21 T5GTGCAGCG. CJAGAAGCGC 7AAAGC f <? 



FIG I 




« 

* 
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I ACCUAGAAGC TG7GGG7GAG C3737AC7AC GGCG-^CG 737GCAAGCA 

31 CGCCACCACG *:scts— g:gccagcga cwcaacks 7AC3ACACCG 

.:: AJC73CACAA CSTCTCCCC: ACCCACOO"? A- — *r— — 

" ZZ ~"~~ — CACCT"7 =AAC5Tr.ArC 3AGAACT77A ACA7G75GAA 
::i 3AACAACA73 C733A3CA3A T^AT-AC^A CA7CA7CAGC C737GCGACC 

agagcctgaa ^c-^sts aagc73acc= :=-37sc=7 gacc-caac 

::i T-AGGAACAC tACCAACACC AACAACACCA 3~CCV.;.AA 

:s: caaca'caac agc=a=ggca "at-uscg c: = c:acats aagaac7=ca 
---tcaacat cacca-agc a7::=cgaca acatccas^ =3ac.-;acgcc 
:taca a3c733a7a7 c373agca7c cacaacg.'.c'a -icaccascta 
:::::t;at: tc:t;:aaca ::acc=7=a? liajgcc t=c=ccaaga 
5— atz-jactac? gcgcccccgc csGcrrcscr 

atcstsaac? ^caac;a;.v. •'AAGTrCAGC 33CAAG3GCA 3C73CAAGAA 

——.A ."AC3GCA7 "3GC33C73 373AG.7AC.-3 
:i A:C7C773C7 QAA^GCACC "GSCCSAGG AGCACGTSTT r-i-rr-CAGC 

3.-.g.\act:_a c:=acaac== caagacsa?? a7G37gcac.7 7 = aa7gagao 

8-21 -•=:CCASA77 AAC73CAC3C 373CCAAC7A CAArAACCGC AAGC3CATC3 
S51 ACA7C33C" C3GGC3C3C7 77=7ACACr.A C7AACAACA7 CA7C33CAC3 
5:: A73C3C3AGG C"AC733AA rA7-:-AGA 3CCAAG73CA AC3ACACCC7 
;;3C=AGA7r G73AGCAA.-..- 73AAGGAGCA 377CAAGAAC AACAC=A7C= 
7377CAAC-A GAGrASCSGC 33G3ACCC33 A3A7C373AT SCACACC7TC 
— (^CSAATTCTT C7AC75CAAC ACrACCCCGC 7Z7VJAACAG 

~acc7ggaac qgcaacaaca cctgcaacaa caccac^jcc accaacaaca 

.1^1 A7A7TACCC7 ccagtgcaac A77AAGCAGA .-ATCAACAT g 

12 31 37GGGCAAGG CCA737ACC3 "3CC3CA7C .ACCGCCAGA 7r;==T3CAC 
2S1 3A3CAACA73 ACCS=7=TK "=7GA=r=C CSACSKMC A ACV.ACACC3 

... — — CA .7"3Crr:G GCGCCWsa CA7GCGC3AC 

.-.AA:7GGAGA7 C7GAGC737A CAAG7ACAAr: T.7GG7CACGA 7CGAGCCCC7 

zzzzz-zzzz C"accaagg 'z^r,rr,zzz CG7GG7GCAG cgcgacaacc 



C SHgfT J of f ) 
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-4 51 GCGCCGCCA7 Z ZZZZZZZTZ 77CC7GCGC7 7C77GGCCGC GGCGCCGAGC 

1501 ACCA7GGCGG ZZZZZkZZZ* CACCC7GACC G7CCAGCCCC GCC7CC7CC7 

1551 GAGCGGCA7C ^TGCAGCAGC AGAACAACC7 CCTCCGCGCC A7CGAGGCCC 

L£01 AGCAGCA7AT Q;rTCCAGC77 ACCC7G7GGG GCA7CAAGCA GC7CCAGCCC 

.w * tjunti^.j .wwiw ^Aw^Au\.Agw * -C7GCCC77 

IGCC7GC TZCGGCAAGC 70A7C7CCAC CAC1ACGG7A CCC7GGAACG 
1-51 "7CC7GGAG C VAC A AG AG C C7GGACGACA 7C7GCAACAA CA7GACC7CG 
A7GCAG7GGG A3CGCGAGA7 "GA7AAC7AC ACCAGCC7GA 7C7ACACCC7 
1 = 51 GC7GGAGAAG ASCCAGACCC AGCAGGAGAA GAACGAGCAG GAGC7GC7GG 
aww.wwaC^A C'^^ NJwArtw. js*T7CGACA7 CACCAAC7GG 



• • WSJ 



* Z m ' 



^.t^BMIMMM) « M • ^M « « « ^ M— Ml «■ 4 MM • «*«M« «•« MM MM A MM W W*«b4B«*MM 

• -.»wwA*CGTC T7CGCCG7GC . GAGCA77G7 3AACC CCCGAGGGC7 

rts - AGw«-CCC7 QAGCC7CCAG ACCCGCCCCC CCG7CCGGCG CGCCCCCCAC 



- - - - .j^-^wjrtws* W.A.,jrtuun *frt\j*.Gv*rACC GCGACACCAG 

* * S * *" M MM % M M M*WM MMM M « M M MM «»riaM.>MMMM«S « IM M <•> M . * _ 

.... --wv_/\Ww> ^iGvavwww. .jwwnt *-A.C.G0G7G GACC7CCGCA 

C>_G-G77CC7 ^77CAGC7AG GACCACCGCG ACC7GC7GC7 GA7CCCC3CC 
22 51 CGCA7CG7GG AAC7CC7AGG CGGCGGCGGC 7GGG**GG7GC 7GAAG7AC7G 
Z3C1 C7GGAACC7C C7CCAG7A77 GGAGCCAGGA GC7GAAG7CC AGCCCCC7GA 



- » • •J-*--- * jw . GmA C^w«AwwJs.w a*Cjwwu*Gu .CwAGGGCAC CGACCGC 



A . wC Aww » CjC 7» CAGAGGGC CGGGAGGGCG A7CC7GCACA 7CCCCACCCG 



^6. 
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FIGURE 5 
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